Here are some comments I made after reading Dan Raggett's most recent
draft of the HTML 3.0 RFC (the plain text one, that is):
<draft-ietf-html-specv3-00.txt>
dated 28 March 1995. This is indeed a wonderful piece of work. My
contributions to this effort are comparatively miniscule (even
if this letter appears to be long). Mostly I talk about some
inconsistencies in the document, some suggestions for clarification
(well, I wasn't sure, so probably others might be confused too) and
finally some suggested changes in recommended usages. I hope you
find them useful, and not frivolous.
Sincerely,
Ian
-- Ian Graham .................................. igraham@utirc.utoronto.ca Instructional and Research Computing University of Toronto-------------------------------------------------------------------
Attributes (Page 11)
a) string literals ... states that string literals should replace characters that might be misinterpreted (e.g. ",' or >) by HTML character references. This of course should *not* be done when the string literal is a URL -- in this case the string literal should contain URL encodings of questionable characters. I believe this should be mentioned here. As far as I know this applies only to HREF and SRC attributes. What about ID and NAME? Should fragment identifiers be URL encoded? I am guessing not, as they are NAME tokens, but I just don't know, and don't recall anthing in the URL RFC about this (the draft I have is old....). Whichever, it might be a good idea to state which is the case here.
b) name tokens Are name tokens case-sensitive? I always thought they could be were, but in practice browsers treat many attributes as case insensitive (and let us not forget mosaic, which does a bit of both....). This has always been confusing, and not clearly spelled out in the RFC -- should this situation be clarified here?
Document Structure - the HEAD Element (Page 17) The HEAD element can be safely omitted only if the document writer remembers to place the HEAD elements at the top of the document. I've Certainly seen many examples where this was not done. IMO having HEAD and BODY tags helps to enforce proper placment of head and body elements, and for this reason I suggest that the RFC strongly recommend, or even require, the use of HEAD and BODY tags in valid HTML 3.0 documents.
BR Element (Page 36) What is the recommended formatting for subsequent BR elements? For example, should <BR><BR><BR> be treated as three (line) breaks, or as a single break? I prefer three line breaks, as this seems to me more in keeping with the idea of a <BR>.
P Element (Page 33) The recommendations state that subsequent empty paragraphs are discouraged (i.e. <P> <P> <P> ). Perhaps the RFC should recommend* browser behaviour for this case - I suggest recommending that browsers ignore empty paragraphs.
As an aside, I assume that "empty" means an element that contains only whitespace - how do entities like fit in this definition? Should the RFC formally define text-"empty" elements ?
Horizontal Tab -- DP attribute (Page 39) The text says that the designated decimal point character can be altered by the language content, as set by the lang attribute on enclosing elements. Is this the best choice -- I would prefer that DP override the existing default. For example, when writing a piece of text in a particular language with a given DP separator, I may very well want to override this separator for tab-aligned information, for example:
* a period . for scientific numbers (overriding the language specific separator) * perhaps a special symbol for other data, for example a h (used to separate hours/min in siderial notation -- e.g. 12h30), " for seconds, * the dash for phone numbers....
Also -- what happens if you specify align=decimal, but there is no decimal in the text to be aligned? Perhaps you should be able to specify a default ordering in alignment, for example:
align=decimal,right
which would align to the decimal symbol, and in the absence of a decimal, align to the right.
This also applies to the DP attribute used in TABLE elements. (see Page 83,86 and 89).
Hypertext Links (Page 40) This reflects my ignorance of the formal definition of #PCDATA (part of %text's contents) but -- does this definition allow for anchor elements that contain no text (or a string of whitespace?, recall my confusion over the use of the phrase "empty" for this type of problem). I don't think this should be allowed. Whatever, I think the RFC should explain this case.
As far as I can tell there is nowhere in the RFC a definition of an "empty" text string. For example, does an empty string consist of any combination of whitespace ASCII characters??? And how would this be generalized to other character sets? And what about ?
Character Level Elements (Page 44) The RFC states "implementations are not required to render these nested highlightings distinctly from non-nested elements". Why this recommendation?
At least for physical formatting tags the opposite seems more sensible -- things like <b><i>bla bla </i></b> so obviously suggests bold-face-italics, and many people already write with that expectation. It seems to me reasonable, therefore, to encourage that usage: (eg, "implementations are encouraged to render ,....").
I note that later on (page 48) this is the recommended behaviour for physical tags.
In this regard there appears to be a difference between informational and physical elements - information tags do not always logically inherit the characteristics of surrounding informational tags. informational tags. Yikes, what a mess. Should this point be discussed further in the RFC?
Another question is - what to do with possible on-the-fly inclusion of text documents, where a block of text containing character formatting tags may be inserted between other such tags -- should this case be handled differently? (I suppose not).
SAMP element (Page 46) What is SAMP for? the phrase "a sequence of literal characters" is only meaningful to those who've used texinfo. Perhaps a usage example for this element would be helpful.
IMG Element (Page 51) Is there any interest in ALIGN=center (to align the image in the center of the page?) - this would be useful for inserting images as page decorations, flowed-around trademarks, etc. This is distinct from the image attribute to the HR element, since the IMG tag does not imply a separator. This would require attributes to control text flow around the image - should text flow on the left only (the right is clear), the right only (the left is clear), should it flow on both sides through the image, or on both sides as two columns on either side of the image. E.g:
ALIGN=center,leftonly (text flow on left only) ALIGN=center,rightonly (text flow on right only) ALIGN=center,noflow (no text surrounding image) ALIGN=center,flowthrough (see below) ALIGN=center,twocols (see below) Here are examples of the two latter cases:
0000000000000000000000000 ______ 1111111 | | 22222222 3333333 | | 44444444 ALIGN=center,through 5555555 | | 66666666 ------ 7777777777777777777777777
or
0000000000000000000000000 ______ 1111111 | | 44444444 2222222 | | 55555555 ALIGN=center,twocols 3333333 | | 66666666 ------ 7777777777777777777777777
How would this affect the CLEAR attribute? I think not much...
What about the Netscape HSPACE and VSPACE attributes? I hate to admit it, but they do help enormously when floating images with surrounding text...
UL/OL Element -- SKIP attribute (Page 58) How does the SKIP attribute affect sequence numbers for unordered lists? Do unordered lists even have sequence numbers?
NEEDS Attribute (Page 71) Reference to this obsoleted attribute appears on Pages 71 and 82.
FIG Element -- ALIGN attribute (Page 72) This returns to the idea of centering with text flow around the Figure. Should we not allow centered figures with surrounding text flow? For the IMG element I suggested the attributes:
ALIGN=center,leftonly (text flow on left only) ALIGN=center,rightonly (text flow on right only) ALIGN=center,noflow (no text surrounding image) ALIGN=center,flowthrough (see below) ALIGN=center,twocols (see below)
Some of the details of this should be left to the stylesheet. For example, centering need not be specifically the center of the page, and flow could be overridden by stylesheet preferences.
CAPTION Element (Page 75) Should we allow for justification, centering, etc. of the caption within it's specified placement? E.G. ALIGN=top,center??? Which would put the caption, centered, at the top of the figure. Perhaps this is better left to the style sheet...
TABLE Element -- ALIGN Attribute (Page 82) This returns to the idea of centering with text flow around a FIG. Should we not allow centered tables with surrounding text flow?
ALIGN=center,leftonly (text flow on left only) ALIGN=center,rightonly (text flow on right only) ALIGN=center,noflow (no text surrounding image) ALIGN=center,flowthrough (see below) ALIGN=center,twocols (see below)
Some of the details of this should be left to the stylesheet. For example, centering need not be specifically the center of the page, and flow could be overridden by stylesheet preferences.
MATH Looks fantastic, and much too much for me.
PRE Element (Page 113) Is it really necessary to cater to obsolete usages, such as having <P>, <IMG> or <FIG> tags inside a PRE? Would it not be better to just state that these tags are not permitted inside a PRE. Again, I am thinking of the fact that many HTML authors use the RFC as a guide to writing, and will be better warned away from inappropriate uasge by stronger cautions -- something like <P>, <IMG> and <FIG> are not permitted inside a PRE element.
would do it. Also, if we use the .htm3 suffix to denote HTML 3.0 documents as distinct from HTML2.0 then we don't really need to preserve, in the HTML 3.0 RFC, all the legacy problems.
I'm not really suggesting that browsers should not support these bad structures, but rather that the RFC should more strongly urge good design over bad.
I vote for dumping the WIDTH attribute (Page 115). This is something a browser can easily decide for itself.
What about newline characters? I note that the generic algorithm described on page 144-145) won't for text files created on a Macintosh, since it denotes newlines by CR. What to do -- Perhaps treat it as follows:
a)If there are CRLF pair treat the pairs as col:= 0 row := row+1, and treat individual instances of the characters as appropriate b) If there are only LF's alone treat them as col:= 0 row := row+1 c) If there are only CR's alone treat them as col:= 0 row := row+1
FN (Footnotes) (Page 118) The example uses footnotes that are part of the document from which they are referenced -- I assume this is *not* the intention, and that FN's can be in any document?
FORMs (Page 124) Someone suggested to me that it might be nice if a single FORM could have multiple ACTIONs, so that the form data could be sent to different URLs depending on which submit button was pressed. This would be useful, for example, if you wanted the choice of submitting the same data directly to a server or indirectly by mail. This would also be useful if you wanted to give the user the choice of submitting data to a secure or non-secure (e.g. RSA) server. It seemed like a reasonable idea to me -- what do you think?
INPUT TYPE=scribble and INPUT TYPE=file (Page 129) How will these non-character type data be sent to the client? As far as I know there is nothing in the x-www-url-encoded MIME type that allows for multipart messages. How should the TYPE=scribble data be encoded for transmission?
INPUT -- ERROR Attribute (Page 131) Currently the ERROR attribute would have to be a value returned from the server. Is this something that could be overridden by client-side scripts? If so, perhaps this possiblity should be mentioned here.
Carriage Return/Line Feed (Page 144) What should a browser do with documents created with text processors that do not use LF as part of the newline string (Macintosh.......). This could sure mess up creating TEXTAREA or PRE sections.
--------- and that's all folks! ----------