Announcing wwwstat-0.3 -- an access log summary program

Roy T. Fielding (fielding@simplon.ICS.UCI.EDU)
Thu, 10 Mar 1994 10:10:44 --100


This message is to announce the availability of wwwstat Version 0.3 --
a program for analyzing NCSA httpd_1.1 or 1.0 server access logs and printing
an HTML-formatted summary report. The program is written in Perl
and, once customized for your site, should work on any UNIX-based system
with Perl 4.019 or better. The program is in the public domain (i.e. FREE).

As an example of what wwwstat can do for you, look
<A HREF="http://www.ics.uci.edu/Admin/wwwstats.html"> here </A>
to see UC Irvine's Department of Information and Computer Science
WWW server statistics.

For more information and access to the wwwstat-0.3 distribution,
point your World-Wide Web client at

<A HREF="http://www.ics.uci.edu/WebSoft/wwwstat/"> wwwstat-0.3 </A>.

For those of you without offsite http access but with ftp access, wwwstat is
also available via anonymous ftp at:

<ftp://liege.ics.uci.edu/pub/arcadia/wwwstat/wwwstat-0.3.tar.Z>

One of the nicest things about wwwstat is that it does not make any changes
to or write any files in the server directories. Thus, this program can be
safely run by any user with read access to the httpd server's access_log and
srm.conf files. This allows people to do specialized summaries of just the
things they are interested in.

Version 0.3 now provides a plethora of options for creating customized
reports and for making it easier for webmasters to maintain their server.
See below for a further description of the options. It does not yet support
the proposed new log format -- that will be version 1.0.

What's new in this version:

wwwstat [-helLoOuUvxz] [-f logfile] [-s srmfile] [-i pathname] [-d date]
[-a IP_address] [-n archive_name] [-t time]

Version 0.3 March 9, 1994
Added links for last server summary, table-of-contents,
and a reference to the standard distribution site because
similar things looked good in Kevin Hughes' getstats output.
Automatically determines URL of previous month's summary.
Now allows extra spaces on Alias directive lines in srm.conf.
Now recognizes Redirect directives and estimates size of message.
No longer counts automatically redirected directory names twice --
it estimates size of redirect message and counts that instead.
Now uses normal printf's instead of perl forms.
Reversed order of printed fields to allow for long names and the
ability to read its own output (see the -i option below).
Updated the country-codes file to reflect latest standards/spelling.
Added the following options (phew!):
Display Options:
-h Help -- just display the usage message and quit.
-e Display all invalid log entries on STDERR;
-- this is great for finding trashed log entries for cleaning.
-l Do display full IP address of clients in my domain.
-L Don't display full IP address of clients in my domain.
-o Do display full IP address of clients from other domains.
-O Don't display full IP address of clients from other domains.
-u Do display IP address from unresolved domain names.
-U Don't display IP address from unresolved domain names.
-v Verbose display (to STDERR) of each log entry processed;
-- useful, but not recommended for long logs.
-x Display all requests of nonexistant files to STDERR;
-- this is great for finding misadvertized or moved URLs.
Input Options:
-f Read from the following access_log file instead of the default;
-- allows you to read archived (or test) logfiles.
-z Use zcat to uncompress the log file while reading [requires -f];
-- allows you to read compressed logfile archives;
use "gzip -9" to get factor of 10 reduction in file sizes.
-s Get the server directives from the following srm.conf file;
-- allows you to archive the configuration along with the log.
-i Include the following file (assumed to be a prior wwwstat output);
-- incredibly great, allows you to keep partial summary
periods in wwwstat output files and purge the logfile.
Inventive admins can find many uses for this, such as being
used by scripts to provide fast, up-to-the-minute stats.
Search Options (include in summary only those log entries):
-a Containing the following "substring" in the IP address.
-d Containing the following "substring" in the date.
-t Containing the following "substring" in the time.
-n Containing the following "substring" in the archive (URL) name.
-- allows you to restrict logfile summaries to an area
of particular interest; great for custom author summaries;
Search strings are matched as substrings, prefix (if string
starts with a caret "^"), or suffix (if string ends with "$").
Note that strings containing $ must be enclosed in single
quotes for most shell command lines.

Obviously, versions of this program would also be nice for the Plexus
and CERN servers. However, I found that much of the logic for finding
file names was just too specific to the NCSA server to justify all the
other work of making this general. Feel free to do so yourself.
In particular, wwwstat has been carefully designed to accurately estimate
the files and bytes transmitted per request by following as closely as
possible the logic used by the NCSA server in handling requests.

This work has been sponsored in part by the Defense Advanced Research Projects
Agency under Grant Number MDA972-91-J-1010. This software does not
necessarily reflect the position or policy of the U.S. Government and no
official endorsement should be inferred. Their support is appreciated.

If you have any suggestions, bug reports, fixes, or enhancements,
send them to me at <fielding@ics.uci.edu>. Also, I would like to ask anyone
who uses wwwstat on a regular basis to please send me an e-mail message which
indicates how and where it is being used (i.e. to publish stats, perform
research, assist in server maintenance, and/or just allow HTML authors to
see how much their work is appreciated) and also, if it is public information,
a URL to your site. This is, of course, only voluntary and I don't want
anyone to divulge private information, but please understand that such
information allows free-software authors like me to justify the time and
effort needed to build quality tools.

Have fun,

...Roy Fielding ICS Grad Student, University of California, Irvine USA
(fielding@ics.uci.edu)