We had been experiencing considerable delay on our server connections.
Discovered that it was the server was doing an initgroups call for
each fork'd process. As we use yp, and our group file keeps growing,
AND yp is single threaded, all of the accesses were getting queued
up waiting for ypserv.
We tweaked the code to allow us to only do the initgroups
call once, and use that information the remaining times. As the
uid/gid is always the same, this is sufficient.
FWIW, there's more in the way of speed increases where that came from,
and some of them are fairly easy to arrange for. On each connection,
the NCSA server:
*) Talks to the nameserver --- yet another opportunity for YP to
serialize you. I'm not sure how to fix this *portably*, but
cacheing the hostnames of recently seen clients in shared memory
eases things somewhat. Compiling with -DMINIMAL_DNS keeps the
server from talking to the nameserver at all, and is a simpler and
better option for those who can live with it.
*) Tries to open a whole lot of .htaccess files which aren't there.
People running close to the edge can get around this by turning off
the .htaccess checks entirely with an AllowOverride None at the
right spot in access.conf. This may be a substantial win for those
afflicted with AFS. (The checks for symlinks at every directory
are also a potential source of overhead, though in that case,
things may be better simply because the directories in question
actually exist, and so kernel machinery like the namei cache works).
There are a few more obvious speed improvements which are harder to
arrange for (you have to change some of the server code), but the
payoffs, at least for the first listed hack below, are substantial:
*) Reads the request and MIME header from the client character by
character, taking a context-switch into and out of the kernel on
each. This is a MAJOR performance hit, and easy to kludge around,
but you do have to change the server code. It's only mildly hard
to fix right.
*) Opens the locale database to find out the names of the months, and
opens some other file to find the time zone. (Actually, the C
library does this behind httpd's back, but the effect is the same).
I got rid of this overhead by doing a few dummy time conversions
before starting to listen on the socket --- this initializes the C
library time-conversion code in the parent process, and so the
children don't have to do it themselves after the fork().
I've fixed most of the above in the server I'm running (all except
.htaccess files, which require some code cleanup to get right), and
that gets you close to the end of the line --- much improvement beyond
that will probably come only by eliminating the fork on every
transaction. (The overhead of fork() is difficult to measure directly,
but it shows up indirectly in some of my other measurements, and it
seems to be large).
That's only after some cleanups, though --- the standard NCSA server
spends most of its time figuring out what groups "nobody" is in, over
and over and over...
rst
PS --- your patch has a *very* minor bug --- if the server rereads the
config files, it doesn't change the group info, even though
User might have changed in httpd.conf, and the appropriate
groups might have changed in any event. This is never likely
to come up in practice, but I'm a little compulsive about these
things.
[1] If you really want to do this right, at least on SunOS, you can
recv the header with MSG_PEEK instead of reading it, and then only
read those bytes out of the kernel buffers which actually contain
the header, leaving the rest for a CGI script. This handles POST
right, as well as GET. David Robinson came up with this idea and
has actually coded it up. Or, you could do as the CERN server does
--- buffer the client socket as usual, and then pipe the contents
of the buffer to any script that wants to see them, but that's more
work starting from the existing NCSA code.