Re: LANG as an attribute

lilley (lilley@afs.mcc.ac.uk)
Tue, 1 Aug 1995 11:22:24 +0100 (BST)


Jon Bosak said:

> [Tom Neff:]
> > 2. A browser for Hebrew or Chinese might well attempt, if it knows the
> > language is appropriate, to display text right-to-left or
> > top-to-bottom in future. Are there presentation issues or risks
> > inherent in allowing a character-level switch from, say, Hebrew to
> > Spanish and back?

> You bet. It's an implementation nightmare. It's not impossible, but
> people who have studied this tell me that it's not nearly as simple as
> it looks at first glance.

It certainly does appear doable at first glance.

> This subject received a great deal of discussion (most of it before I
> became involved) in the DSSSL working group. I don't know all the
> details, but I do know that no one is expecting bidirectional
> capabilities in DSSSL-Lite.

Oh, great. Well I hope that CSS manages it then, bacause as Jon says:

> This happens fairly often in scholarly publications devoted to Near
> Eastern studies and probably to a certain extent in Hebrew and Arabic
> works that quote European languages.

Quite. So it is a common requirement and DSSSL-Lite does not intend to
address it. Anyone else see this as a problem?

> Note that we're talking specifically about embedding a fragment of a
> right-to-left language like Hebrew or Arabic in the middle of a
> passage written in a left-to-right language like English or Spanish.

Yes. Understood.

Now, let us presupose routines already exist to lay out a line of type
left to right and to put fully rendered lines onto the output device.

Let us arbitrarily represent, in this example, Hebrew letters as some entity
set, as much for convenience in email as anything else.

Let us also suppose that we have a full, scalable unicode font at our
disposal - because the issue was setting bidi type, not handling missing
fonts.

The input text is being streamed in and we cannot look ahead. Actually
we could probably look ahead with a small amount of buffering but lets
try without first.

The line under construction so far is:

left^--------------------------------^right margins
is expressed in Hebrew a
^========== insert point

Next letters are s and space

left^--------------------------------^right margins
is expressed in Hebrew as
^========== insert point

Next letter is ℶ and the LANG has switched to Hebrew. We continue to
insert characters from left to right, but do not move the insertion point.
I will represent the Hebrew letters here as punctuation !@#$% to illustrate
where the letters are placed.

left^--------------------------------^right margins
is expressed in Hebrew as $
^========== insert point

Does the total line length exceed the margin? No so we add another one,
&resh;

left^--------------------------------^right margins
is expressed in Hebrew as #$
^========== insert point

And so on

left^--------------------------------^right margins
is expressed in Hebrew as }! &%$#
^========== insert point

Ah. adding the next letter would take us over the margin. So we save
the word we are currently inserting: }!

left^--------------------------------^right margins
is expressed in Hebrew as &%$#
^========== insert point

Display the line, clear the line under construction, add the saved word
and continue:

left^--------------------------------^right margins
{!
^========== insert point

After a bit we switch to English again:

left^--------------------------------^right margins
@*+{!
^========== insert point

So the insertion point moves to the end of the current string

left^--------------------------------^right margins
@*+{!
^========== insert point

and off we go.

left^--------------------------------^right margins
@*+{! or in other words
^========== insert point

Now with a little tidying up - a one character lookahead to stop
spaces being the first character of a line and to allow a one character
rollback if the next character was a space and the last character was
one of the five letters that have final forms - that seems like a
reasonable starting point for laying out bidi text.

-- 
Chris Lilley, Technical Author
+-------------------------------------------------------------------+
|       Manchester and North HPC Training & Education Centre        |
+-------------------------------------------------------------------+
| Computer Graphics Unit,             Email: Chris.Lilley@mcc.ac.uk |
| Manchester Computing Centre,        Voice: +44 161 275 6045       |
| Oxford Road, Manchester, UK.          Fax: +44 161 275 6040       |
| M13 9PL                            BioMOO: ChrisL                 |
|     URI: http://info.mcc.ac.uk/CGU/staff/lilley/lilley.html       | 
+-------------------------------------------------------------------+