In fact, there is. Well, not actally in SGML, but in the "application
conventions" that I have used to map SGML onto WWW.
All elements in HTML have either mixed content, RCDATA, or CDATA.
Mixed content is a mixture of <tags>, &entity; references,
and #PCDATA. RCDATA is just &entities; and data. CDATA is just data.
[SGML actually has a couple other content modes: ANY and
element content, but I didn't use those.]
CDATA is only used for the TITLE. RCDATA is used for XMP and LISTING
(entity references _are_ recognized in RCDATA sections, so you
can inlclude the _full_ end tag like this: </XMP>. But the
string </ followed by a letter _ends_ the section, whether the
letter starts the XMP tag or not.)
The convention is that in PCDATA sections, newlines serve only
to delimit words, whereas in RCDATA, newlines are significant.
We can't use RCDATA for the PRE or FIXED tag, cuz the <a> tag
won't be recognized in RCDATA. So I'd suggest you ignore
newlines inside the PRE element, and use <p> to delimit lines.
And since we're not using the exact semantics of PRE, I like
the idea of using the name FIXED in stead. In SGML:
<!ELEMENT FIXED - - (#PCDATA|A|P)*>
The fact that the MidasWWW browser can support the semantics
of PRE is due to its non-standard parsing, where it treats
illegal tags as data, rather than ignoring them. SGML says
they'r not data, whatever they are, and the HTML doc in
the web says to ignore them.
I'm integrating my low-level SGML reading routines into
MidasWWW now, and with the author's consent, the non-standard
behaviour will soon go away. [The MidasWWW 1.0 browser doesn't
do < or & either -- that too will change.]
I've got it running, but there are a couple integration bugs I haven't
yet tracked down.
I've also got something of a validation suite for HTML, so that
implementors can easily see if they've gotten it right. And the
suite goes from easy to hard, so they can see how much of it
they got right, and if they don't want to fix it, they can at least
document how much it's broken.
Dan