libHTML-921202.tar.Z -- HTML parsing library with demo
program. (includes current DTD)
html_spec-921202.tar.Z -- HTML.html and related files,
moving toward a spec.
I've made some significant changes to the DTD.
1. I put SHORTTAG NO in the SGML declaraion. This
means a) _all_ attributes have to be quoted (numbers,
names, ids, CDATA -- everything), but b) it makes
parsing cleaner: minimization isn't allowed. The NET
feature is disabled (that's for doing
<bold/foo bar/ in stead of
<bold>foo bar</bold>
tricky to parse.)
2. I figured out a way to support HEAD/BODY tags
without breaking things. We lose some structure, but
we gain some too. And this time I stuck by the
mixed content rule-of-thumb.
3. I got rid of the TYPE attribute on anchor tags.
What's that thing for anyway. Does anybody use it?
4. I changed TYPEWRITER to PRE. My new motto is:
just describe it; don't prescribe it.
4. I added a TEXT attribute to anchor tags. The idea
is that all <A HREF=...> point to MIME text/* objects. The
TEXT parameter tells you the subtype, so you don't have
to zen it from the filename (or so you can override the
filename.) For example:
<A HREF="TheProject.html" TEXT=PLAIN>This is
a link to that file treated as a plain text
file.</A>
<A HREF="abcdef" TEXT=HTML>abcdef is an HTML entity,
even though it doesn't have and extension.</A>
It's a little prescriptive, but those semantics are
mostly implemented already.
5. I changed anchor names from SGML Id's to NMTOKENS, so you
can use numbers or whatever you want. Since we don't have any
IDREFs pointing to them, there's no reason to use ID's.
In other words, I've moved this feature into the realm of
application conventions rather than SGML features.
6. I changed XMP and LISTING back to RCDATA. I was messing with
the MidasWWW browser, and I couln't figure out how, when I'm
dumping the SGML out of the data structures into a file, to
tell whether I should change '<'s to "<" or not. If we avoid
CDATA, we can use entities everywhere, and processing is simpler.
How's that sound?
Now some ideas to kick around...
* Somebody mentioned a <VAR>...</VAR> tag for stuff that shouldn't be
cached. I'm thinking it should be a node-wide empty tag, like
ISINDEX. Maybe <VOLATILE> is a good word.
Then I think to myself, why not make it an attribute on the HEAD
element:
<!ATTLIST HEAD STATUS (OK, ERROR, VOLATILE) OK>
More later...
Dan