A week or so ago, I wrote a fairly lengthy set of comments on HTML, and
received a polite invitation to consider instead HTML+, from the point
of view of TEI conformance/convergence. Alas, I haven't had time to do
this as fully as I'd wish, but just to show I'm still here, I have a few
comments to make on the recent semantics/presentation discussion. I'll
do this by embedding comments from other recent notes (no names, no
packdrill)
1. Rendering Hints
> The most important comment I have is that there is too much emphasis
> on rendering. In the draft RFC, description of a feature is often
> accompanied by some discussion of how it might be rendered. For a
> HTML+ includes rendering hints and when they are recognised by browsers, they
> should be treated in a predictable/consistent way. It makes sense to describe
> these, in the minimal fashion needed for such consistency, at the point in
> which the relevant attribute is introduced.
There is some detailed discussion of the rendering vs. semantics issue
in chapter 6 of the TEI Guidelines and elsewhere. In the TEI dtd, every
element has a REND attribute which may be used to specify rendering
information in an as yet undefined manner. In addition, there is an
optional RENDITION element in the TEI header which may be used to
specify a default rendition style for any given element type. In the
future it seems probable that DSSSL will provide the tools needed to solve
this problem.
There's a related interesting wrinkle to the problem not touched on in
discussion here, which the TEI has scratched its virtual head over:
suppose what you want to do is to encode the rendition *used by* an
original source -- not necessarily to reproduce it, but because (say)
you're interested in the history of printing or the effect of layout on
semantics. What kind of semantic markup scheme would you define for
that? We tried unsuccessfully to set up a work group for that last time
round -- maybe next time...
2. ROLE and EM
>The EM mechanism
>is intended to prevent an explosion of tag types. The combination of logical
>and physical emphasis into a single element means that browsers can deal
>sensibly with emphasis even when it doesn't recognise the logical category
>involved. It gives us open ended logical categories with rendering hints.
Oversimplifying, one could say that this ROLE attribute is sort of the
reverse of the TEI REND attribute. The difference (as someone else
pointed out) is that you might well want to constrain the places that
certain ROLEs are legal within your document, while you probably don't
want to constrain what's legal as a value for REND. To give a more
concrete example, if I decide to treat level 1 headings as just another
kind of highlight, and tag them (in HTML+) <em role=heading1> then I'll
have problems finding all the level1 headings in my document to build a
table of contents, I won't be able to require that they appear in the
right place in the document hierarchy etc. If I encode them as
<heading1 rend=head1-style> or equivalent however, then I get the same
semantics, but I can now enforce any local rules I want to enforce
about where <heading1> is legal and where it is not.
On general vs. specific purpose highlighting, the TEI has what I
consider to be the best proposal I've yet seen (but then I would, wouldnt
I):
(a) Use specific/semantic tags where appropriate (examples in the TEI dtd
include foreign, emph, title, socalled, term ...) with an optional rend
attribute if you want to spell out how this particular element should be
rendered
(b) Where you want something highlighted without wishing to be specific
about the semantics, use the neutral <hi> element, again with an optional
rend attribute.
<hi> is therefore almost synonymous with EM -- except that the ROLE
attribute would be regarded by many TEI hackers as an abomination second
only to the Ravenous Bugblatter Beast of Traal. Also, EM sounds a lot
too much like <emph> -- which in the TEI scheme has the very specific
semantics of <emph>linguistic</emph> emphasis, not necessarily
typographic. Typographic emphasis and linguistic emphasis do not
necessarily co-occur!
3. DTDs for visually-impaired access
> As a further point re rendering hints, consider, please, that some of
> the Internet community is blind. Do you want to include hints to
> writers of audio browsers also?
There's quite a lot of work within the SGML community on defining a
general purpose DTD for sight-impaired access. I don't have the
reference handy but can provide it if this is of particular interest.
4. Names of elements
> Re: Tables: Why not spell it as <table> instead of <tbl>
> I did in earlier versions, but TBL seemed more in line with the other tags.
>Someone else said that shorter tagnames were preferable "because we like
>TLAs".
The TEI policy is to use self-explanatory names wherever possible, but
in general to make the length of a name inversely proportional to its
likely frequency of occurrence. Hence tags you're unlikely to come
across often tend to have self explanatory names -- so I would certainly
vote for <table> against <tbl>. And as a stickler for accuracy, I find
myself compelled to point out that "TLA" is short for "three letter
acronym" and that "tbl" (like most HTML tags) is not an acronym but an
abbreviation.
More later, I hope.
Lou