See http://dri.cornell.edu/pub/davis/html-parser.html for information about it. The tar file also includes code to convert from HTML to plain text, with or without running footers and page numbers.
provided without warranty