The HTML DTD is well specified, the only problem is that few people bother
to read it! I have posted a solution months ago:
ALL browsers should tell you if the document you are reading doesn't conform
to the DTD. A good browser would then allow you to view a detailed report
showing where the problems occur and suggesting suitable fixes. This can
be implemented as a separate process which takes a document on stdin and
returns an HTML doc containing the "lint" report.
Now who wants to write the HTML lint filter?
Dave