we finally have finished a first version of our indexing extension
to HTTP servers. Feedback is very welcome.
Here's some info on it:
Chris
-- /* * Christian Neuss % neuss@igd.fhg.de % ..in the humdrum */ ================================ SNIP ==================================Fraunhofer IGD proudly presents:
HTTP Index Server Extension ===========================
Many thanks to Ari Luotonen from CERN for his contribution of the extract-title command, and to Stefanie Hoefling for many hours
of debugging. -- Chris
What it is: The HTTP Index Server Extension allows for doing free text queries on hierarchies of HTML files. The functionality is pretty close to a WAISINDEX interface, but the package is a lot smaller and more portable. What it basically does is have cron create an index file in regular intervalls, and access this index file whenever an index query from the client is being issued. The supported query syntax allows combining keywords with AND and OR, so a query could look like "server and script". As result of the index query, list of all
HTML files containing both words will be created and sent back to the client. Files will contain a relevance feedback, and are clickable hyperlinks to the files themselves.
Another feature is the ability to use a thesaurus for conceptual searches: Entering "{picture}" as query will not only retrieve files containing the word "picture", but also related concepts like "image" etc. The thesaurus format we support is the ANSI standard Thsaurus Image Format (TIF). Thesaurus information is
available from many sources, but the most important feature is probably the ability to create specific technical thesauri related to whatever is stored in your HTML text database.
How to get it: Access it from ftp://ftp.igd.fhg.de/incoming/ICE-1.01a.tar.Z ftp://info.cern.ch/pub/www/src/ICE-1.0b.tar.Z in order of preference. The version on the CERN server is slightly older, but I'll send them an update, and they will probably soon put up the 1.01a version.
Bugs: Probably too numerous to mention :-)
This is a very early version, and will be improved in the future. The index extension will probably become part of the CERN httpd server, and perhaps Rob McCool will also include it in the NCSA distribution. The version I make available is mostly for those
of you who need indexing badly, and don't want to wait for future server releases.
Please contact me if you have bug-reports or suggestions:
Christian Neuss
c/o Fraunhofer IGD
Wilhelminenstr. 7
64283 Darmstadt
GERMANY Fax: (+49)6151 155-199
email: neuss@igd.fhg.de
Have fun :-), -- Chris