>However the indexable attributes are tagged in HTML documents, I
>believe it is important that there should be some way to closely
>associate index entries with the text they are about. In other words,
>the index entries should be immediately next to (before or after) the
>paragraph or list item.
Not to disagree, I think we're just mixing issues here because of my sloppy
wording. The attributes I'm talking about for the HTML header are
*document* attributes. To the extent that an index entry is a document
attribute, I think it belongs in the header, so that a browser (robot or
human) doesn't have to scan the whole document to decide if it might be
relevant or not.
There's a fundamental problem in indexing (even more so in building key
words) -- the relevant terms to have indexed change over time. For
example, I'm sure that a lot of news organizations are wishing that there
were sophisticated tags to every reference to Haiti in their archives.
>Are meta tags only allowed in the HEAD of a document? If so, I dont
>believe they are sufficient for indexing. Probably some filter should
>be used to extract the embedded index tags to store them separately.
Maybe it's just begging the definition of "meta," but that's pretty much
what I see as belonging in the HTML header. (I'm saying HTML to be sure to
differentiate it from the HTTP header.)
>While I am suspicious of any scheme involving manual replicatiion,
>automatic replication (caching) will be required for scalability.
>Caching indexes is the easy part, but caching of services such as
>searching services is trickier.
I didn't mean to imply either manual or automatic replication. Each has
relevant uses.
Nick