Dan's reply:
>A terminology note here: there is no one "MIME data format." There's
>the ubiquitous message/rfc822 format that you can stick anything
>inside using MIME techniques. But the basic unit of information
>in the MIME spec is an _entity_ -- just an arbitrary stream of bytes.
OK, when I said MIME data format I meant MIME message format, and was
referring to the outer level only (and note that MIME *implies*
RFC822). I certainly did not refer to a particular content-type, not
even to message/rfc822. The only thing that isn't well-specified when
one talks about "a file in MIME format" is whether line breaks are
given as CRLF or as LF (or as something else).
>The question is, when you're sending an entity from one
>place to another, how do you know where the end is?
This is a matter for the transport agent, not for MIME -- by the time
you call in the MIME agent to handle the data you must *already* know
where the end is. For entities contained in other entities (e.g. the
content-type family multipart/*) there is a way defined in MIME to
find the end of the inner entities, but this is not true for the
outermost entity.
>From the MIME point of view, an NNTP client and server have
>an implicit agreement that the entity going across the
>wire has a content-transfer-encoding of 7bit.
>
>This allows them to use the dot-on-a-line-by-iteself technique to
>terminate the entitiy.
MIME and NNTP should never need to talk to each other. MIME is a UA
level format, NNTP is a message transfer agent protocol. NNTP can use
the dot-on-a-line-by-itself convention not because it is a 7-bit
protocol (which it isn't -- although other message transfer protocols
like SMTP are) but because it is a line-based protocol. MIME is also
mostly a line-based format, even if the content-transfer-encoding is
8bit -- it is only in binary mode that we get in trouble (since
conversion from one kind of line terminator to another is dangerous
for binary data).
>They also share assumptions about the content-type as
>a separate issue. The client assumes the response to an
>ARTICLE command is a message/rfc822 entity, while the
>response to a BODY command is text/plain.
That's a nice way of putting it.
>[Long description of why you want to put the byte count in the MIME
>headers omitted]
>
>It is somewhat intertwingled, but I still kinda like it.
And I still don't. I have the feeling that it would be much easier to
adapt HTTP to other (non-TCP) transport protocols if the size of an
entity is given separately rather than computed from the entity itself
(after all this nonsense is only necessary because TCP doesn't have a
way to distinguish EOF from a broken connection). As I understand it
your main objection is that under my proposal you will have to
construct the necessary headers in a buffer first. I don't believe
that this is that much of a hassle in today's computers -- it
shouldn't be more than a couple of kilobytes even in extreme cases,
which is peanuts even for a standard PC.
An issue on which I don't have a strong opinion is whether we should
represent line separators as CRLF in the header -- anyone else?
Cheers,
--Guido van Rossum, CWI, Amsterdam <guido@cwi.nl>
"The lawnmower. Surely such a gadget could not have been generated
independently in two separate areas."