Actually if you look back I think you will find that I have been one of the principal
proponents of using UNICODE for exactly the reasons cited. In fact I have produced
the code for UNICODE support and hope to be incorporating it into a browser soon.
The point I was making is that IMHO text/plain is not a high value markup even with
UNICODE and that given a UTF encoding it is reasonable to encode it as octet adressable
rather than by character cell. Actually on reflection I don't like that idea anymore,
it means that the position of the anchor would change depending on whether the file
was sent in UTF or unpacked format which I don't like at all :-)
I doubt that the Japaneese would want to use UTF encoding in any case the principal
reason for supporting UTF is that it is very efficient for Western European text and
OK for eastern european, or at least as efficient as a scheme not using context switching
wich must be avoided like the plague since the files then become linear braindamage
which cannot be interpreted except by starting from the beginning. If people are
prepared to forgo character index adressing then Huffman coding provides much
better compression than merely switching character sets.
Actually one point to be made clear. A UNICODE Web document should be interpreted in
the context of the Content-Language specified. Thus if the language is Japaneese the
Han characters should be displayed as Japaneese, not US inspired mid pacific bit-savers.
Phill Hallam-Baker