HTML/EML implementation

Terry Allen (terry@ora.com)
Thu, 17 Feb 1994 16:18:50 --100


(Context: Dan Connolly proposed some simplifications to the use
of SGML on the Web, which I opposed.)

| So... you're one of these folks that drive the SGML bandwagon without
| any background in formal systems, huh? ;-)

Chuckle. Just trying to use available tools *for document processing*
to get my docs online. Show me a better bandwagon.

| If we require real-time processing of all legal SGML documents,
| we buy nothing in terms of functionality, and we render almost
| all current implementations broken.

You don't have to parse any SGML except docs conforming to the
HTML DTD and its SGML declaration (in which you can turn off
several features, though not the ones that concern you).

| Yes, by an SMGL compliant parser, but not by any parser built
| out of standard parsing tools like regular expressions, lex, and yacc.
| (well, actually, you could do it with lex, but it's a pain...)

| You say "crippled", I say "expedient". Remember: the documents are
| still conforming. It's just the WWW client parser that's non-standard.

What it comes down to is that you want to make your job easier
by eliminating some of the basic functionality of SGML. There
is no way of doing this in the HTML DTD. You are advocating
another ML, call it EML for "expedient."

But HTML-conformant docs won't necessarily parse through your client,
e.g., if they have comments in them. (What will you do about those?
render them as normal text?) My problem is that
all the other tools I have will tell me that such docs are valid.
To make this work, I'd have to take my SGML docs and preprocess
them, then check to see that they're valid according to your EML.

| The one thing I've learned about the internet is that the party who
| writes and distributes code to implement his spec is the guy who
| sets the standard. I bet I can get client developers to agree on
| my idea sooner than you can get them to adopt SGML.

I know of two browsers in development that use SGML, and existing
SGML browsers (freestanding, not net-ready) are not much slower
than (cited for example only) Mosaic at rendering docs.

| We'd accomplish these objectives:
| (1) These restricted HTML documents are still compliant.
| They still work with SGML tools.

But the converse is not true: SGML tools will pass as valid
docs that aren't valid EML (or maybe just aren't properly
formatted for EML). See above.

| (1) We could teach folks what HTML looks like a whole lot easier.
| (2) We could write HTML processing software easier

Unworthy arguments. We could do this even better by staying in
ASCII.

| (3) We would increase confidence among authors that their
| documents will be rendered (and searched, indexed, outlined,
| and other wise processed) accurately.

Oh come on. Build the software right, don't say that you'll make
mistakes if it has to be complicated.

[[this part totally parenthetical on both sides!
| Ah... so I guess we agree on that part! Keep in mind that these TGML
| documents are still SGML documents in every sense of the word. It is
| only the TGML parser which is not an SGML parser.

No, you're proposing EML. The mythical TGML will have inline
comments and entities for special characters, and all sorts of
things that you would object to as slowing up processing, but
which document managers want to make *their* lives easier.
It will have better attribute typing, and will eliminate the
record end problem in mixed content, etc. It will be SGML
done over again right, rather than SGML stripped down.
]]

To sum up: The EML you're proposing would make browser
construction easier at the expense of document management in SGML.
It can't be specified in SGML because it cuts out basic features
of SGML, and SGML tools won't necessarily work right with documents
intended for EML browsers.

-- 
Terry Allen  (terry@ora.com)
Editor, Digital Media Group
O'Reilly & Associates, Inc.
Sebastopol, Calif., 95472