RE: WWW support for Cyrillic (and UNICODE)

Richard L. Goerwitz (goer@midway.uchicago.edu)
Wed, 2 Nov 94 11:02:33 CST


>Does the Line Mode Browser also support UNICODE ? It seems to me that
>supporting UNICODE is a good thing. You'd get Cyrillic, Kanji, and anything
>else that does not fit into U.S. ASCII. Judging by this list people are
>lukewarm to the idea of supporting UNICODE. Why is that ?

Several reasons, in my estimation:

1) Unicode increases overhead, being 16-bit rather than 8
2) It is not supported by GUIs
3) 16-bit characters aren't supported by compilers
4) US/European programmers are cultural chauvenists

Of course most of these objections are spurious:

1) UTF-8 allows for backwards 8-bit compatibility, adding
to storage requirements only for characters outside the
one-byte range
2) (valid objection)
3) Wide characters should be supported by ANSI-conformant
compilers and libraries
4) US and European programmers aren't stupid; they are just
not terribly aware of the situation in countries like
Russia, India, Japan, etc. Give them gentle nudges, and
they *will* respond....

Adding multi-language capability to the Web is going to take time,
because it will require changes to HTML, to servers, and to clients.
Given the lack of multilingual support in most GUIs, a lot of new
widgets will have to be created, and people will have to stretch
themselves to learn how things like Japanese and Arabic scripts
work. It's going to take time, but it appears we'll get there.

Right here several things have been hashed out. For example, we
all pretty much seem to agree that LANG and CHARSET or CODEPAGE
attributes will be needed for HTML (with some sensible defaults).
We've also come to the realization that logical ordering of data
is the only way to go. The "visual" ordering that MIME allows
for embedded Hebrew or Arabic just won't work in the long run.
So we have to bite the bullet and make sure that clients can
do the visual reordering themselves for mixed right-left/left-
right languages. (There is, by the way, a terribly explained al-
gorithm for doing this in appendix A volume 1 of the old pub-
lished Unicode standard; I can supply people with a more prac-
tical tutorial if anyone wants it.)

Things *are* happening. Be patient, and offer to help. Inject
comments where you feel they will be appropriate. Cut some code
if you know how; otherwise, do some research on scripts and stand-
ards in various countries and help guide the process. Above all,
though, don't complain. Help us out!

Richard Goerwitz