Re: mystery NCSA httpd problems on gnn.com

Robert S. Thau (rst@ai.mit.edu)
Tue, 31 Jan 1995 23:09:27 +0100


Date: Tue, 31 Jan 1995 00:49:25 +0100
From: Rob McCool <robm@neon.mcom.com>

/* ... I wrote:

* ... If the bug keeps on
* showing up at an annoying rate, the next thing I'll try is closing
* and reopening the socket if no connection requests have come in for
* ten seconds or so, but that seems a little drastic.
*/

You have to be careful to prevent race conditions there. There's a
chance people could get connection refused if they hit your server at
just the right time.

--Rob

I'm actually not worried about this during prime time --- if we
haven't got a connection for twenty seconds, we can be pretty sure
there is a problem, and those people would get their connection
refused anyway. Even if the problem is just our building's line to
the rest of campus going briefly offline, that in itself is likely to
result in a bunch of incomplete handshakes, which can and do persist
long after the line goes back up.

Off-hours are another story, however --- it is entirely possible that
in the off-hours, there's simply no one that wants to connect for
twenty seconds. So if connection attempts do come in at *precisely*
twenty-second intervals, people will get screwed. This means that if
I throw this kludge into my server, I'm going to have to make it
somehow sensitive to the time as well.

Phase-of-the-moon bugs, anyone? Sigh...

rst

PS --- another shot-in-the-dark question:

Does the TCP spec actually mandate that accept(2), or the
equivalent, not return until the entire three-way handshake is
complete? Are there TCP stacks with different behavior?