Re: Putting the "World" back in WWW...

Nathaniel Borenstein (fxrojas@nlsarch.austin.ibm.com)
Tue, 04 Oct 94 13:03:11 -0600


From: lilley@v5.cgu.mcc.ac.uk (Chris Lilley, Computer Graphics Unit)
Date: Tue, 04 Oct 94 11:17:31 +0000

> I'm not sure I follow... excuse me if I missed the point ... but it sounds
> like you are suggesting we put "ANY ENCODING" in the document and have each
> viewer convert into UNICODE...

The alternative seems to be to force everyone to write all their documents in
unicode,

NOT force. But encourage.

As I said yesterday, people in other countries already have methods for encoding
the characters of their national languages, and these methods should be
supported.

I agree.

> If so, this will cause MAJOR interoperability problems across the network.

Why?

Because it depends on whether you view the WWW as a clean homegeneous encoding
environment or a messy heterogeneous encoding nightmare. In such an open
endeaver I choose the latter.:-)

Why would this cause more severe problems than forcing everyone to use
Unicode when authoring documents?

Because 1) it appears that WWW will doing it's own localization and 2) in order to
support all the classes of localization services (display/fonts/languages/
input method/...) WWW will need to agree (or "encourage") either
a. N methods for localizing - Macs, Windows, X11, ....
b. 1 method for all environments.

> Expecting every client to be convert to from every possible encoding will
> never work

[Is that "to be able to convert" ?]
Sure it will. We *are* using a common libwww aren't we?

Yes, thanks. When it comes to I18N and localizing (e.g. Asian languages) you'll need to
figure out how to customize the libwww such that the locale specific information
is split out.

I appreciate what you are saying, but the picture is not entirely as you present
it for two reasons.

Basically true. And this is a matter of degree of how much effort is put into
enabling for multi-lingual, i.e. Universal instead of territorial.

Firstly, not all clients will need to convert. Realistically, many of the
documents using a particular encoding will be read by people also using that
encoding. So, converting to Unicode on the server would impose a burden of two
encodings - to and from the same native encoding that the people are using in a
particular country.

I guess the folks from E. European (using Latin-2) aren't going to be able to
view text from W. Europe (likewise for Greek, Turkish, Russian...).

Secondly, the phrase "system administrators" rings warning bells here.

Yes, I've noticed.

Your
mental model seems to be of a technical support team running a server, doing
code conversion on all their documents to a common format, etc. This is the
traditional heavyweight publishing model.

No. Rather, it is to encourage that folks start generating documents in a common
format to gain wider audiences. And to encourage the WWW to build tools for that
common format. It sounds like I've reached the limit of where that encouragement
is welcomed ..:-)

Finally, I want to re-emphasize that I don't want to force anyone one to one
encoding over the other. I do want to encourage that we think globally in our
network view - that is all.

Frank