> Hypermail is great. Mhonarc is even better. But I've got a lot of
> ideas for improvements:
>
> Requirements:
>
> 0. Support MIME ala mhonarc.
Hmmm....Let me think about it.
>
> 1. Base the published URLs on the global message-ids, not on local
> sequence numbers. So in stead of:
>
> http://www.foo.com/archive/mlist/00345.html
>
> I want to see:
>
> http://www.foo.com/archive/mlist?message-id=234234223@bar.net
I already support this - it just isn't the main interface right now:
<URL:http://www.netimages.com/ni-cgi-bin/fetch?newsgroup=alt.devilbunnies&messageid=4bo3ol$3s2@xmission.xmission.com>
I have to fundamentally change the way I index the files to improve the
speed of message-id searches. Right now I use some heuristics based on
locality and the idea that people are usually looking for _recent_ messages
to make a usually fast search of a flat database.
[...]
> 2. Support format negotiation. Make the original message/rfc822 data
> available as well as the enhanced-with-links html format -- at the
> same address. This _should_ allow clients to treat the message as a
> message, i.e. reply to it, etc. by specifying:
>
> Accept: message/rfc822
Pretty easy to do. We just need agreement by browser authors to
request it. Actually saves me processing time since I don't have to mark
up the message, just decompress and throw it.
> 3. Keep the index pages to a reasonable size. Don't list 40000
> messages by default. The cover page should show the last 50 or so
> messages, plus a query form where folks can select articles...
I already chop it into small chunks. alt.devilbunnies volume overwhelmed
people pretty early. I break it down to the day plus a search button.
What is really needed is flexible specification of the display.
> 4. Allow relational queries: by date, author, subject, message-id,
> keywords, or any combination. Essentially, treat the archive as a
> relational database table with fields message-id, from, date, subject,
> keywords, and body.
Got Subject and From already with perl regex matching and AND/OR and
month level date restriction. The rest will be part of my full-body text
search rewrite. RSN, I hope.
> In fact, consider this table to consist of all the mail messages
> and news articles ever posted (past, present, and future). Any
> given archive has partial knowledge of the table. Let's call
> this global service the message-archive service. So rather than:
>
> http://www.foo.com/archive/www-html?message-id=234234223@bar.net
>
> I want to see:
>
> http://www.foo.com/message-archive?to=www-html@w3.org;message-id=234234223@bar.net
Hmmm...That is pretty much what I do now. I am going have to change my
usage of '&' for seperators though since SGML parsers choke on it in.
> Goals:
>
> 5. Generate HTML on the fly, not in batch. Cache the most recent pages
> of course (in memory?), but don't waste all that disk space.
Already do that. Decided it was too restrictive in the upgrade path to
batch it. Worse, rebuilding even the indexes on a nightly basis was
loading the machine down badly.
> (support if-modified-since in the on-the-fly generator, by the way)
RSN.
> Update the index in real-time, as messages arrive, not in batch.
Hmmm...requires adding better file locking and a spool watching program,
but not really a problem otherwise since I already do the updates
incrementally with a cronjob as often as you want. OTOH: Is there any
real reason to force actual real time updating? Archives are not meant to
replace newsreaders.
> 6. Allow batch query results. Offer to return the raw message/rfc822
> data (optionally compressed) for, e.g. "all messages from july 7 to
> dec 1 with fred in the from field".
Hmmm..
> 7. Export a harvest gatherer interface, so that collections of mail
> archives can be combined into harvest broker search services where
> folks can do similar relational and full-text queries.
:( I haven't had much luck with Harvest combined with Linux.
> 8. Allow annotations (using PICS ratings???) for "yeah, that
> was a really good post!" or "hey: if you liked that, you
> should take a look at ..."
Hmmm...
> 9. Make it a long-running process exporting an ILU interface, rather
> than a fork-per-invocation CGI script. Provide a CGI-to-ILU hack for
> interoperability with pre-ILU web servers.
What he said. :) ILU?
You left out CD-ROM support. One of the other admins around here is
always pestering me to make my software more CD-ROM friendly by
seperating the index tree from the article storage tree so he can move
the articles off to CD-ROM.
-- Benjamin Franz, Usenet-Web author <URL:http://www.netimages.com/~snowhare/utilities/usenet-web/>