I gather that the general opinion is that HTML document
structure should look like:
p with emphasis in it
Unfortunately, the common way that this is coded is:
<TITLE>t</TITLE>
<H1>head</H1>
p with <em>emphasis</em> in it
<ul>
<li>item 1
<li>item 2
</ul>
The unfortunate part is that there's no DTD (well, none that I can find)
that will enable a conforming SGML parser to infer that structure from
that document. However, if folks are willing to put <P> tags at the
_beginning_ of every paragraph, it can be done.
My current solution is
(1) Docs lacking <P> start tags are supported in a backwards
compatible mode of the DTD, ala:
<!DOCTYPE HTML [
<!ENTITY % HTML.pSeparator "INCLUDE">
<!ENTITY % html PUBLIC "-//connolly hal.com//DTD WWW HTML 1.8//EN">
%html;
]>
<title>backwards compatiblem mode</title>
<H1>header</H1>
para 1
<p>
para 2
in this mode, the text of the paras are content of the BODY element,
and the P elements are empty, ala:
para2
(2) In the standard usage of the DTD, paragraphs are containers
and require explicit start tags, ala:
<!DOCTYPE HTML "-//connolly hal.com//DTD WWW HTML 1.8//EN">
<title>backwards compatiblem mode</title>
<H1>header</H1>
<p>para 1
<p>para 2
The parser infers:
para1
para2
Here are the current feature test macros:
<![ %HTML.Minimal [
<!ENTITY % HTML.linkRelationships "IGNORE">
<!ENTITY % HTML.linkMethods "IGNORE">
<!ENTITY % HTML.linkRedundantInfo "IGNORE">
<!ENTITY % HTML.forms "IGNORE">
<!-- @@ nested lists -->
<!-- @@ phrases -->
]]>
<![ %HTML.Obsolete [
<!ENTITY % HTML.PLAINTEXT "INCLUDE">
<!ENTITY % HTML.titleCDATA "INCLUDE">
<!ENTITY % HTML.litCDATA "INCLUDE">
<!ENTITY % HTML.NEXTID "INCLUDE">
<!ENTITY % HTML.font-phrase "INCLUDE">
<!ENTITY % HTML.anchorNameCDATA "INCLUDE">
<!ENTITY % HTML.pSeparator "INCLUDE">
]]>
<!ENTITY % HTML.pSeparator "IGNORE"
-- use P element as paragraph separator, rather that container.
This means not all paragraphs need to start with a <P> tag.
-->
<!ENTITY % HTML.linkRelationships "INCLUDE"
-- Adding markup to links to show the relationship between
ends of a link
see http://info.cern.ch/hypertext/WWW/MarkUp/Relationships.html
-->
<!ENTITY % HTML.linkMethods "INCLUDE"
-- Adding markup to links to show the methods supported
by the referent object
see http://info.cern.ch/hypertext/WWW/MarkUp/Elements/A.html
-->
<!ENTITY % HTML.linkRedundantInfo "INCLUDE"
-- Adding markup to links to give redundant information
like URN, content type, title...
-->
<!ENTITY % HTML.anchorNameCDATA "IGNORE"
-- Anchor names should be distinct. SGML parser can validate
this if the NAME attribute of the A element is declared as ID.
But that restricts the syntax of an anchor name to an SGML name,
i.e. a letter followed by letters, numbers, periods and dashes,
up to NAMELEN (34) characters long.
-->
<!ENTITY % HTML.PLAINTEXT "IGNORE"
-- Support for the <PLAINTEXT> tag as a sign of the
end of th HTML data stream and the beginning of a stream
of text/plain data
-->
<!ENTITY % HTML.titleCDATA "IGNORE"
-- Is the TITLE element #PCDATA, RCDATA, or CDATA content?
On Mosaic, it's #PCDATA, but in the linemode browser,
it's more like CDATA, but not quite.
-->
<!ENTITY % HTML.NEXTID "IGNORE"
-- Used by the NeXT implementation to keep track of the
next anchor id to use
-->
<!ENTITY % HTML.font-phrase "IGNORE"
-- allow B, I, TT, U outside PRE,
CITE, VAR, etc. inside PRE
-->
<!ENTITY % HTML.litCDATA "IGNORE"
-- treat XMP, LISTING as CDATA, as per linemodeWWW
-->
<!ENTITY % HTML.forms "INCLUDE"
-- Support for forms as per
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
-->
If you're interested, see
http://www.hal.com/%7Econnolly/drafts/html-design.html
for background etc., and
http://www.hal.com/%7Econnolly/html-test/html.dtd
http://www.hal.com/%7Econnolly/html-test/html.decl
http://www.hal.com/%7Econnolly/html-test/ISOlat1.sgml
for the DTD itself.
Daniel W. Connolly "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project (512) 834-9962 x5010
<connolly@hal.com> http://www.hal.com/%7Econnolly/index.html