> How about having a fully configurable log files, something
> that would understand escapes like this: ...
Although on the surface this seems like a good idea, I want to determine
exactly why it is desirable and make it clear where the pitfalls lie.
I can think of three reasons why it is desirable:
1) Make the largest set of known access data available for logging;
2) Allow the individual service provider to choose the exact subset which
should NOT be logged, thus saving local disk space.
3) Make the log file format consistent across multiple servers/services
(assuming those servers follow the same log conventions).
Note, however, that 1 and 3 are also accomplished by establishing a
fixed format which contains all the information.
Now, as for the pitfalls:
1) There is no compelling need for log flexibility other than saving
disk space. Regardless of how the data is organized, people will still
want to use some log analyzer to view the data and thus the format should
be designed primarily for machine readability and for occasional human
reading or grep search. The log analyzer will have its own (possibly
configurable) output format for human consumption.
2) The content of the data is only known at the time the log entry is
made. Any information that was not written at that time is lost to
any later analysis. Thus, it is usually preferable to log everything
and let the log analysis program choose what should be ignored.
3) Every time the configuration changes, the old log file must be deleted.
This is because any log analyzer will only be able to understand one
log format at a time.
4) Special formatting conventions (like the square brackets surrounding
the date field in NCSA httpd logs) make it much easier for analyzers
to parse the data and identify mangled entries -- a condition which
occurs quite often with NCSA httpd.
5) It makes it slightly harder for people like me to write and test a
simple log analyzer program.
Having said all that, I still think that it may be a good idea providing
that the above concerns are addressed (i.e. I have faith that the server
authors will go out of their way to make my life easier, providing that
I let them know what will make my life easier). Although I personally
would prefer a fixed format, I am willing to go with the flow.
In that spirit, let me propose that some generic indicator (such as "-")
be used for any field which is desired by the configuration string (or by
the fixed format) but is unknown or empty for a particular log entry.
Thus, if the configurable string indicates REMOTE_IDENT should be logged
between FULL YEAR and CLIENT HOST ADDRESS (as in "%Y %I %C"), and
REMOTE_IDENT is empty, then the output should be like:
"1994 - simplon.ics.uci.edu"
rather than
"1994 simplon.ics.uci.edu"
for reasons that should be obvious to most hackers.
....Roy Fielding ICS Grad Student, University of California, Irvine USA
(fielding@ics.uci.edu)
<A HREF="http://www.ics.uci.edu/dir/grad/Software/fielding">About Roy</A>