Non-persistent Cookie proposal

Koen Holtman (koen@win.tue.nl)
Fri, 11 Aug 1995 14:19:05 +0200 (MET DST)


As promised on www-talk, here is a session state header proposal that,
among other things, tries to address some of the potential caching
problems in the Session-ID (State-Info) proposal by Dave Kristol.

These caching problems are in the area of graceful degradation when
stateful dialogs (for which Session-ID is a support mechanism) are
used with caches that do not conform to the Expires header definition
in the (draft) http spec. Some people, particularly people who want
to use stateful dialogs in applications where real money is involved,
are very concerned about such non-conforming caches: the fear is that
customers will blame them if the service does something inappropriate
because of a broken cache, while it is really the fault of the cache
administrator.

For the record: I am not in the tele-shopping business myself. I
don't know if the level of paranoia about non-conforming caches found
there is justified, but I fear that it just might be. In my opinion,
more information about current practice, and informed speculation
about future practice, is needed to decide on this issue.

The material in the appendix about the Request-ID header at the end of
this message not new: I have sent it to www-talk (but not http-wg)
some weeks ago. I have decided to include it here again, as there has
been some discussion on merging facilities for stateful dialogs and
for clicktrail statistics recently, just as I thought that this topic
was dead on www-talk (the consensus being that this merging would be a
bad idea because it would generate too much confusion).

Koen.

----snip----

Non-persistent Cookie proposal
==============================

Koen Holtman, koen@win.tue.nl

This document can be seen as a personal summary of some parts of the
the session-id discussion on www-talk in July 1995. [2] contains a
summary of privacy issues discussed.

I do not want to argue that the proposal below is the only way to go
when putting support for stateful dialogs in http.

Simpler mechanisms are possible, for example [3]. The `solution
space' for stateful dialogs has a number of dimensions:

- simplicity of implementation
- time of general availability when standardized
- downward compatibility
- simplicity of use
- reliability
- amount of privacy protection
- maximum complexity of stateful dialogs supported
- amount of cache control possible
- risks when used with non-conforming caches
- amount of confusion on www-talk generated

The proposal below occupies one point in this solution space.

The main goal of the text below is to make all dimensions visible, not
to give a personal opinion on what the best point in the space is. In
fact, my personal opinion on this has changed a lot over the last
month, and I have not reached a final conclusion yet.

** Background

This document assumes that you have read [1] and [2], and that you are
familiar with [4].

[1] the NetScape Cookie proposal
<URL:http://home.mcom.com/newsref/std/cookie_spec.html>.

[2] the proposals for of gathering consumer demographics
<URL:http://www.w3.org/hypertext/WWW/Protocols/demographics.html> by
Daniel W. Connolly.

[3] Session-ID (State-Info) proposal by Dave Kristol
<URL:http://www.research.att.com/~dmk/session.html>

[4] The draft HTTP/1.0 specification
<URL:http://www.ics.uci.edu/pub/ietf/http/>

** Terminology

Most terminology in this document is as in [4]. New terminology:

Browser: user agent (used for interactive web sessions).

Stateful dialog: An information exchange between a web user and a web
server that that extends beyond the submission of one form. In a
stateful dialog, the server changes its behavior as a result of
previous actions by the user. Conceptually, the state is a property
of the dialog (or session), not of the browser or server. On the
implementation side, either the browser, the server, or both hold the
state.

Cookie: a string to be sent by a browser to a server in requests,
representing the state in a stateful dialog. The string is kept at
the browser side, but supplied and updated by the server using
set-cookie response headers. Only the server need to be able do
decode the cookie. See [1] for an example cookie definition. A
cookie can represent the entire dialog state directly, or be a key in
a server-side dialog state database.

Request-id: a unique value sent by a browser to a server in requests
to allow the server to generate better clicktrail statistics.

Session-id: a unique identifier sent by a browser to a server in
requests to give the server a way of keeping it apart from other
browsers for the purpose of engaging in stateful dialogs. Unlike a
cookie, the value of a session-id can never change during a stateful
dialog.

Persistent information: information that is remembered by the browser
for future sessions

Non-persistent information: information that is lost when the browser
exits

Proposal: Non-persistent cookies
=================================

** 1. The set-cookie response header

A server may choose to send a set-cookie response header in a response
to a browser. The header looks like

Set-Cookie: <string>

where <string> does not contain whitespace. (This allows for future
extensions). Examples:

Set-Cookie: color_screen;no_jpgs;small_graphics
Set-Cookie: Joe%20User;joe@foo.com
Set-Cookie: id324254
Set-Cookie: joe:4234982343

If the browser receives a Set-Cookie response header, it can either
honor it or ignore it.

** 2. Honoring a Set-Cookie header

To honor a set-cookie header received from a server, a browser will
start including a header

Cookie: <string>

in _direct_ requests to that server, and to that server only. For the
purpose of this definition, individual servers are identified by the
hostname+portnumber pair in the request URL. Directness of requests
will be defined later.

The cookie <string> is taken from the Set-cookie header, and may be
changed by the server by sending a new Set-cookie header with a new
string. A Set-cookie header with an empty string can be taken as a
request to stop sending Cookie headers.

** 3. The decision to honor a Set-Cookie header

If a Set-Cookie response header is honored, this means that a server
can get more accurate statistics about the behavior of the browser
user if the browser user is behind a firewall or proxy cache. Thus,
the user should have the option of deciding not to honor Set-cookie
headers for privacy reasons.

It is suggested that browsers provide something like the following
preferences box:

+-----------------------------------------------------------------+
Honor set-cookie requests:
( ) Always honor request
( ) Start honoring requests if one is done in a response to
a form submission (POST request).
(*) Ask once for every site, use reply in later sessions
( ) Never honor requests
+-----------------------------------------------------------------+

where the (*) is the default setting. In real browsers, the
terminology used in this box should probably be adapted to make more
sense to the average user.

As a matter of etiquette, services should only send Set-Cookie headers
if honoring them would bring some direct advantage to the user, like
being able to use a stateful dialog. Sending Set-Cookie headers only
to get better clicktrail statistics should be considered a breach of
etiquette. To get clicktrail statistics, the Request-id header (see
the appendix below or [2]) should be used.

If a browser decides to honor one Set-Cookie header from a server, the
service author can expect that Set-Cookie headers sent in the near
future will also be honored.

A browser can contain a timeout mechanism to stop honoring Set-cookie
headers when, say, 30 minutes have past since the last contact with
the server.

** 4. Direct and Indirect Requests

A request is indirect if it
1) fetches the contents of an inline picture or other inlined object
2) resolves a 3xx (redirection) response

All other requests are direct.

If a browser is to honor a Set-Cookie header, Cookie headers must be
sent in _direct_ requests to the server. It is preferred that Cookie
headers are never sent in indirect requests to a server.

The distinction between direct and indirect requests is made for two
reasons:

a) this allows for better caching of stateful dialogs, as discussed
below
b) this allows for better privacy: it makes impossible some
`stealthy' cookie matching strategies that could be adopted by
cooperating web service providers to allow matching clicktrails.

** 5. Cookie headers and caching: the default

By default, a cache, no matter whether it is in a browser or in a
proxy, must never cache responses to requests with Cookie headers in
them.

Responses which contain Set-Cookie headers must also never be cached,
no matter whether the request itself contained a Cookie header.

There are three reasons for this default

1) Responses in stateful dialogs are dynamic by nature. No big
payoff can be expected from caching them. In fact, in a scheme
where they can be cached, there is a danger of cache memories
dropping useful `static' responses (like inline pictures from
normal sites) to store relatively useless dynamic responses.

2) It is vital that responses in stateful dialogs are never cached,
else services using stateful dialogs would become unreliable.

An Expires: <yesterday> header in a response can be used to
disable caching (in both browsers and proxy caches), but some
system operators may be tempted to `tune' proxy or browser caches
under their control to keep expired responses around for, say, 5
minutes, even though this makes the cache non-conformant to the
http spec.

Such `tuning' is relatively harmless for normal, non-interactive
services, but disastrous for stateful dialog reliability. As
stateful dialogs are still relatively uncommon, it is a valid
assumption that many system operators are not aware of the
stateful dialog risks involved with `Expires tuning', so
non-conforming caches may remain with us for some time to come.

By making responses in stateful dialogs using the Cookie headers
a special case in the cache algorithm, independent of `Expires
tuning', this particular non-conformance risk to stateful dialogs
is eliminated.

(Note: the Session-ID (State-Info) proposal by Dave Kristol [3]
assumes that system operators will never do such `tuning': this
allows proposal [3] to be very much simpler than this cookie
proposal.)

3) The developer of a stateful dialog service will usually not have a
cache between his browser and the CGI scripts under development.
If caching were enabled by default on requests with Cookie
headers, the author would have to remember to put Expires:
<yesterday> headers in the responses generated by the scripts; if
a forgetful or badly informed author of a stateful dialog service
would not do this, the resulting unreliability would not show up
on tests within the local environment.

This default behavior of _not_ caching responses to requests with
Cookie headers is the main reason why indirect requests, which do not
get Cookie headers, were introduced. An inline picture request is
indirect, so inline pictures on stateful dialog pages will get cached
by default.

Of course, if caching of an inline picture is not desirable, the
service author can always put an Expires header in the inline picture
response.

If a page contains pictures that depend on the dialog state, the
service author can implement these state dependent pictures by making
the generation of the URLs in the <IMG SRC=...> tags depend on the
state. Similar state-dependent URL generation can be done for
redirection (3xx) codes.

** 6. Cookie headers and caching: overriding the default

If the response to a request on URL U with a Cookie header contains an

Expires: <date>

header, and no Set-Cookie header, a cache can interpret this to mean
two things:

1) the entity included may be cached, but not beyond the
<date> given,
2) the entity in the response does *not* depend on the dialog state.
Thus, if the entity is cached and some browser does a request on
URL U, it is OK to serve the cached entity, no matter what the
cookie header value in the request is, even no matter whether a
cookie header is present at all in the request.

Note that it makes no sense for servers to send both a Set-Cookie
header and an Expires header in the same response. Servers may
however choose to do so to get backwards compatibility with old
proxy caches.

Also note that it makes no sense to put Expires: <yesterday> headers
in responses to requests which contain Cookie headers. Servers may
however choose to do so to get backwards compatibility with old proxy
caches.

This way of defining Expires: semantics ensures that caches need never
consider the cookie header when accessing their cache memory. The
Cookie header is never part of the cache key, the header is only
important when making the decision whether to cache or not.

==========================

APPENDIX: Proposal: The Request-ID: header field.
---------------------------------------------------

To write the text below, I took proposal I. from [2]:
<URL:http://www.w3.org/hypertext/WWW/Protocols/demographics.html>, and
changed things according to issues (mainly connected to privacy and
caching) discussed in the Session-id threads on www-talk.

The text below can be seen as a personal summary of the parts of
these threads that pertain to Request-IDs and privacy.

The Request-ID: header field.

Adapted from the proposal in
<URL:http://www.w3.org/hypertext/WWW/Protocols/demographics.html>.

Am HTTP request may include a header field of the form:

Request-ID: $session $request++

e.g.

Request-ID: 342%33a4d443 12

The HTTP client chooses a random string as a "session identifier",
and each request in that session is identified by a number that
increases monotonically with time.

It is suggested that clients use a different random $session string
for each server they talk to. This will make it more difficult for
cooperating web service providers to match clicktrails in their
logfiles, thereby getting user profiling information that is much
more accurate than the user would want to give them without some
form of compensation. Note that it is illegal to match logfiles
under the privacy laws in some countries. The suggestion to use
different $session strings can be seen as supporting these laws by
making the crime of matching logfiles pay off less.

A "session" is not formally defined (other than "a set of requests
with the same $session id"), though I suggest that browsers begin a
session when they are invoked and when they have been idle for 30
minutes or more, and allow some user interface to say "start a new
session" (i.e. "choose a new random session ID").

Each user agent must provide a mechanism to turn the generation of
Request-Ids off, especially for site security administrators that
prohibit its use.

If no Request-ID headers are present, this should be interpreted by
web service providers as a statement that the user does not wish to
reveal his or her exact clicktrail for privacy reasons. An attempt
by service providers to silently obtain the clicktrail by some
other means (for example by using a session-id, cookie, or
anonymous authentication mechanism that could be part of future
versions of HTTP), should be considered to violate the privacy
wishes of the user.

Whether HTTP clients use a global $request counter, or one counter
for each server talked to, is up to the clients. HTTP clients
which are not traditional user agents (e.g. multi-threaded robots)
may use several sessions in parallel.

A proxy must pass the Request-ID: header through unmodified. One might
consider some sort of Proxy-Request-ID, though I doubt it would be
valuable.

An HTTP cache can assume that the response to an HTTP request does
_not_ vary as a function of the Request-ID. That is, an HTTP proxy
need not include the Request-ID in its "cache key." If the
response to a request can vary, an Expires header should be used in
the response to reflect this dynamism.

It is preferred that the request-ID header is _not_ used to
implement stateful dialogs, in which the content of pages is
different for different sessions. For stateful dialog support,
other mechanisms (for example a session-id, cookie, or anonymous
authentication mechanism that could be part of future versions of
HTTP) should be used.

Alternative proposal:

Instead of introducing a new Request-ID: header, include the

$session $request++

information in the From: header. Examples:

From: (#342%33a4d443 12)

From: "Roy T. Fielding" <fielding@beach.w3.org> (#342%33a4d443 12)

======================