Because the queue is used both for connections that are ready to be
accepted as well as for half-negotiated connections. The latter can
fill the queue, starving any new connections from being negotiated.
* One other piece of puzzling evidence --- intense bursts of
* connections don't always provoke the bug. I try to keep track of
* peak load here by logging a histogram of transactions/sec
* vs. number-of-seconds. We routinely log bursts of >10
* transactions/sec a few times a day even on weekends, when this sort
* of "freeze-up" behavior doesn't seem to have been a problem.
We've always been able to track it down to a line being down. When the
watchdogs report a server not responding (both of them invariably do
it at the same time BTW even though they're on different outbound
lines), my first step is to look for a down route. Out of a list of
10-20 hosts, I ping to each one and usually by the second or third one
I encounter a failure. Traceroute can then generally find the down
route.
* Incidentally, killing off the server process and restarting it
* always gets things moving again (at least it does here), so that
* action seems to clear whatever inside the kernel is causing the
* bottleneck.
Yes, because the socket listening to port 80 is closed and then
re-opened with a fresh queue.
* That hack seems to have helped matters, but I'm not sure that it's
* gotten rid of the freeze-ups entirely --- I spotted something which
* looked an awful lot like the same old freeze on Friday, although
* this time the process was waiting in select(). If the bug keeps on
* showing up at an annoying rate, the next thing I'll try is closing
* and reopening the socket if no connection requests have come in for
* ten seconds or so, but that seems a little drastic.
*/
You have to be careful to prevent race conditions there. There's a
chance people could get connection refused if they hit your server at
just the right time.
--Rob