>Check out Harvest:
> http://rd.cs.colorado.edu/harvest/
Harvest has certainly caught our attention here. It has several
interesting aspects. But I was really asking about commercial,
general-purpose search engines. There are a number of research projects
that are working on new document models. I'm wondering if anything has
made the leap from research to commercial product.
By the way, one of our engineers has pointed out to me that our engine has
hooks that allow the definition of a document to include some structural
info. That's what the Lotus folks are using to make our engine more
intelligent about how to deal with its compound documents. The engine is
thus able to search for words occurring in a named region of a Notes
document.
This could obviously apply to HTML documents, at least for searching the
header v. the body... but most everything in the headers these days is more
appropriate to include in our index as attributes rather than in the
full-word index. If some sort of longer text, such as an abstract, were
often available, then it would make sense for us to add that as a
searchable region in the full-word index.
Nick