Newsgroups: comp.mail.mime,comp.protocols.misc,comp.infosystems.www.misc
Subject: Content-Transfer-Encoding: packets for HTTP
Distribution: world
In developing enhancements to HTTP, which uses many MIME techniques,
we keep coming across the problem of "how do you signal the end of a
big block of binary data?"
The original method was to use the closing of the connection as
EOF. It's simple and it works for lots of applications.
A recent development is that many (certainly not all) HTTP servers add
a Content-Length: header in the resopnse. This allows the client user
interface to give the user an idea of how much longer the transmission
will take, but it's not reliable for any uses beyond that.
Besides: it's often the case that the HTTP server is receiving the
data from a source that doesn't allow it to determine the length of
the data before sending the data (a unix PIPE, for example.)
The idea of sending multiple objects in one HTTP transaction or
conducting multiple HTTP transactions over one TCP connection has
been raised again and again on the www-talk mailing list.
Most of the proposals involve using the MIME multipart/* syntax
and the boundary mechanism. This has terrible performance implications.
An HTTP server would have to scan the entire data stream to come up
with a suitable boundary before sending even the first body part.
Besides, the boundary mechanism is designed to address problems that
simply do not arise over a reliable 8-bit data stream.
However, one recent proposal struct me as a very good idea: The strategy
is to deploy a new content-transfer-encoding. Let's call it "packet."
The bytes of a body are written into a body part in packets: each
packet consists of a length (written out in ASCII characters, to avoid
byte order and word size issues), a CRLF, and a corresponding number
of bytes. For example, server code might look like:
write_response(FILE *body, const char *contentType)
{
enum { LF=10, CR=13 };
char buffer[4000]; /* chosen to match system considerations */
int bytes;
printf("0200 document follows%c%c", CR, LF);
printf("Content-Type: %s%c%c", contentType, CR, LF);
printf("Content-Transfer-Encoding: packet%c%c", CR, LF);
printf("%c%c", CR, LF);
while((bytes = fread(buffer, 1, sizeof(buffer), body)) > 0){
printf("%d%c%c", bytes, CR, LF);
fwrite(buffer, 1, bytes, stdout);
}
/* @@ Hmmm... what happens if I get an error reading from body?
* perhaps negative packet lengths could be used to indicate
* errors?
*/
printf("0%c%c", CR, LF);
}
The returned data might look like:
0200 document follows
Content-Type: application/octet-stream
Content-Transfer-Encoding: packet
4000
...4000 bytes of stuff...
1745
...1745 bytes of styff...
0
Then the connection would be available for another transaction.
If several objects were to be wrapped up into one and returned using
this content-transfer-encoding, it wouldn't make sense to use a
multipart/* content type, unless it seems reasonable to use a name
like multipart/packet that does not share syntax with the oher
multipart types. I suggest aggregate/mixed, aggregate/parallel, etc.
There's still the issue of "how do you know when you've seen the last
body part of an aggregate/*?" How about this: when you're ready
to read the headers of the next body part, if you see instead a "."
on a line by itself, you're done.
The default content-transfer-encoding inside an aggregate/* is packet. So for
example:
0200 document follows
Content-Type: aggregate/mixed
Content-Transfer-Encoding: binary
Content-Type: text/html
4000
...4000 bytes of stuff...
1745
...1745 bytes of styff...
0
Content-Type: image/gif
4000
...4000 bytes of stuff...
1745
...1745 bytes of styff...
0
.
Does this seem like a reasonable proposal?
Dan