> Roy said:
>>
>> Now, as for the pitfalls:
>>
>> 1) There is no compelling need for log flexibility other than saving
>> disk space. Regardless of how the data is organized, people will still
>> want to use some log analyzer to view the data and thus the format should
>> be designed primarily for machine readability and for occasional human
>> reading or grep search. The log analyzer will have its own (possibly
>> configurable) output format for human consumption.
>
> which is not completely true. I never use a log analyzer. I look at the
> file, and when I want some particular information I use grep. that works
> great for me and I don't need to learn how to use a log analyzer, just to
> generate the info Ari wants to put in the logfile. But I'm only
> a small user. On the other hand large sites may very well be interested
> in saving diskspace (except for the very very large sites :-))
Isn't that exactly what I said? Please note that these are pitfalls,
not barriers. Also, some analyzers (wwwstat in particular) are easier to
use than grep.
>> 2) The content of the data is only known at the time the log entry is
>> made. Any information that was not written at that time is lost to
>> any later analysis. Thus, it is usually preferable to log everything
>> and let the log analysis program choose what should be ignored.
>
> why? If I choose to write down only a limited set into the LOG than I know
> I cannot fetch the other information, but it's my choise and my
> responsibility.
Exactly. However, you are assuming that the person setting up the
configuration is the same as the one reading the log. My point was
that you can't decide what you need after-the-fact and to make this
clear to everyone.
>> 3) Every time the configuration changes, the old log file must be deleted.
>> This is because any log analyzer will only be able to understand one
>> log format at a time.
>>
>> 4) Special formatting conventions (like the square brackets surrounding
>> the date field in NCSA httpd logs) make it much easier for analyzers
>> to parse the data and identify mangled entries -- a condition which
>> occurs quite often with NCSA httpd.
>>
>> 5) It makes it slightly harder for people like me to write and test a
>> simple log analyzer program.
>>
> 3) 4) 5) are irrelevant to a guy like me who simply want to broswe the
> logfile and has only a very small local disk.
They are design concerns regardless of whether they are specifically
relevant to you.
> 3) can be solved by writing down the configuration change in the logfile
> which is also handy for 5) so you will know at all times what the
> format is and which fields are written in the logfile.
> ie. the same string that does the formatting in ari's program should
> be present in the logfile; or if you are willing to forget about 3)
> should be made avaliable for the log analyzer. ...
Unless, of course, your log analyzer happens to be a simple spreadsheet
or database program that cannot handle dynamically reconfigurable record
structures. And beyond that, how is the analyzer supposed to summarize
data that changes content in mid-stride? This kind of feature is what
I call a living hell.
> ... That makes your '-' suggested below unnesecairy,
Nope. It has no effect on the question of what should be logged when a
particular requested field is empty. My point was that it MUST NOT
be logged as an empty string because that will throw off the field
count which is important for simple analyzers and garbage recovery.
....Roy Fielding ICS Grad Student, University of California, Irvine USA
(fielding@ics.uci.edu)
<A HREF="http://www.ics.uci.edu/dir/grad/Software/fielding">About Roy</A>