We already have some tools that will build a list of document titles and
associated URLS so you get something that looks like (<tab> is the ASCII
tab character):
What's New With NCSA
Mosaic<tab>http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/whats-new.html
We can search on a list like this (just the titles or titles and URLs) on
our local server which has turned out to be quite simple and fast. Simply
extending this list to include <A HREF> items makes it extremely easy to
find items on your *local* server or local links to documents on other
servers as long as the link name isn't something like "here." For local
servers this can be extended to include <A NAME> tags. Document authors can
include <A NAME> tags to go along with H1-H6 headers to increase hits.
Following existing practices with FTP, Gopher, etc. these lists can be made
available at the root of a Web server with a special file name. It is not
necessary then for a Web robot to do anything besides pick up the one file.
It would probably be useful to separate local links to local files and
local titles from local links to other servers, but a filter program could
do that along with breaking out ftp, telnet, gopher URLs. Meta-indexes
would have to merge these lists to reduce duplication of course.
I would like to be able to see other meta-information included in these
lists such as document author <LINK REV="made"
HREF="mailto:altis@ibeam.intel.com">, language, whether the document
requires authentication, etc. This additional meta-information would
require the file to include the same information returned by the HTTP
server itself.
Much of the above text may be obvious, but I haven't seen it said on this
list, so comment away.
ka