I still disagree. It is possible to specify a canonical form for URLs
independent of scheme. The quoting scheme described by Tim and myself
(and implemented in HTParse.c and tested in my test suite...) does
just this.
> This is true for the default port (e.g., that
>http://host:80/ is the same as http://host/ but gopher: has 70 as a
>default port, etc.)
Given the definition of equality I proposed, http://host:80/ is
different from http://host/. The fact that they resolve to the same
thing is not part of the URL spec.
> that the same host might have multiple DNS names,
>or that some FTP servers allow case insensitive file names, any number
>of actual equivalences, symbolic links, etc.
None of these things should be part of the URL spec. But things
that are used in practice today, i.e. the significance of ?, /,
and %xx, should be.
>In the grand scheme of things, if you treat "/" and "%2F" as
>different, then at most you'll treat a few things as 'different' that
>are really the 'same', but in fact, this will be an insignificant
>amount compared to the other kinds of duplications.
In the grand scheme of things, the question is whether there's any
common structure to the "parameter package" of a URL. It sounds like
the decision is that there is not, even though this contradicts current
practice.
So the grammar for URLs is just:
URL : IALPHA ':' CHARS
;
with terminals:
IALPHA =~ /[a-zA-z][a-zA-Z0-9-_]*/;
CHARS =~ /[^ <>]*/;
I'm interested to know if the most widely deployed URL implementations
(www, Mosaix, ...) are going to change to conform to this.
Dan