>You can comment on the esoterics of indexing theory. I can say that Notes
>_today_ (based on Verity's engine -- Nick?? :^) gives full-text retrieval,
>weighted hits, and internal highlights. Morevoer, it takes about three
>mouse clicks to index a database. Finally, indices can be either
>server-side or local (important for the disconected user).
Our engine works in Notes as with any other data. Fundamentally, it's just
a serial stream of words. The real issue is that general-purpose engines
use a flat document model, while Notes and structured documents don't.
Clearly, everyone in the search business is going to have to support
non-flat models.
>In another vein, by using structured fields and categories, customized
>views can be presented that organize content beforehand. Full-text
>indexing is wrong, IMHO. Cataloging and classifying lend themselves to
>browsing, and tools that simplify this are invaluable.
Relevancy ranking, which is a component of every serious full-text search
engine, is the *basis* for automation of "categorizing" (more commonly
called profiling). I don't think we can do away with automated profiling
by assuming that authors and publishers will be able to predict the
taxonomies into which readers will want to place their documents.
News wires have used category tags for more than a decade, but they still
want profiling because one of the ways that one can add value to
information is by matching a classification scheme to the audience. The
very best results come from manually building a weighted semantic network,
so the tools to manipulate the network wind up being the most important
maintenance tools.
>1) Name one case of an HTTP server talking to another HTTP server
>
>2) Name one case of a WWW user editing a document, and sending it back to
>the server
Interesting that you mention these issues. When our engine returns a
document through our Web server (from the file system or another gateway it
understands), it adds markup. Specifically, it creates a "match list" at
the top of the document, linked by named anchors to the first matches. It
also boldfaces the matches. Obviously, we could do more.
However, when we index remote documents using our spider, the documents
have to be returned by the HTTP server that "owns" them (our engine is
distributed, we don't keep local copies of documents). Right now, we just
create a results list with anchors back to the "owning" server. However,
if we wanted to add the match list, boldface and other markup, we would
have to do exactly what you describe -- our server would retrieve the
document from the "owning" server, mark it up, and pass it to the client.
I think we'll support both models. The latter one adds latency, but in
some applications, it adds great value. Imagine, for example, if the
server marked up the document with concept-driven "explain" links to a
glossary/FAQ list. A publisher could even sell this linking capability as
a service.
We'll probably try this out sometime in the next couple of months with
Web-related information on the upcoming www.verity.com server (the T-1 is
on, the WebMaster is hired, it's only a matter of time now!).
>3) Name one case of an active object going to the client, capable of
>conditionally executing internal or external code, interacting with the
>reader, and sending a result to the server
We have customers who want that capability now. Imagine a technical
support CD-ROM that the client uses as a local cache to the publisher's Web
server. When the client does a search, it first explores the CD-ROM to
find information locally, then checks with the server for recent updates.
>4) Explain how WWW addresses nomadic users, when all of the programmatic
>intelligence is on the server (CGI) and there is no model for
>client-server or server-server replication\
Virtual sessions, with log-in and log-out.
>5) *Today*, Notes has corporate ACL (very robust, point-and-click) with RSA
>security, and the tools to administer it
This is a clear advantage for Notes, with which I think the Web will be
slow to catch up.
>6) Name one client or server that has implemented the Version attribute in
>the HTTP spec, thus allowing revision control
I think customers will drive demand for this, too, very quickly.
>7) Notes can be bought from a Fortune 500 company with a support staff,
>maintenance agreements, a third-party catalog, and contracts with the
>government (I disagree the true importance of any of that, but others
>believe it)
You don't have to be a huge company to offer those things. For example,
our customer list includes huge, demanding Fortune 500 businesses and great
big three-letter intelligence agencies, all of who use our tools in
mission-critical applications. We'll apply our level of support to our Web
products, as well. I'm sure that others building commercial Web products
will, too.
The bottom line for us is that we're not pursuing an either-or strategy for
Notes or any other information technology, from Notes to HTTP to Z39.50 to
SQL to Acrobat. We're working on technologies that give customers access
to their information no matter how many ways they store it.
Nick