From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Thu Oct 07 1999 - 06:17:10 CDT
In <9910031636.3m8a@babylon.swb.de> rbabel@babylon.pfm-mainz.de (Ralph Babel) writes:
>Let's assume you meant "HTTP headers". With the exception
>of "TEXT" being ISO-8859-1 by default instead of UTF-8,
>how does a "User-Agent" header that conforms to the
>syntax of RFC 2616 conflict with "our" proposed standard?
>| User-Agent = "User-Agent" ":" 1*( product | comment )
>| product = token ["/" product-version]
>| token = 1*<any CHAR except CTLs or separators>
>| CHAR = <any US-ASCII character (octets 0 - 127)>
>| CTL = <any US-ASCII control character
>| (octets 0 - 31) and DEL (127)>
>| separators = "(" | ")" | "<" | ">" | "@"
>| | "," | ";" | ":" | "\" | <">
>| | "/" | "[" | "]" | "?" | "="
>| | "{" | "}" | SP | HT
>| SP = <US-ASCII SP, space (32)>
>| HT = <US-ASCII HT, horizontal-tab (9)>
>| product-version = token
>| comment = "(" *( ctext | quoted-pair | comment ) ")"
>| ctext = <any TEXT excluding "(" and ")">
>| TEXT = <any OCTET except CTLs, but including LWS>
>| OCTET = <any 8-bit sequence of data>
>| LWS = [CRLF] 1*( SP | HT )
>| quoted-pair = "\" CHAR
That is the syntax taken directly from RFC 2616. Surely you are not
seriously suggesting that we should take the exact syntax from an RFC
describing a totally different medium and graft it onto news/mail (recall
that our User-Agent header, though designed for news, has intentionally
been made suitable for use in mail, should that be required later).
Within news/mail, there are accepted conventions, which parsers will
recognise, for where comments or folding may be introduced. Your HTTP
syntax does not even show the LWS, but there are complex rules in RFC 2616
for "implied LWS" which are similar, but surely different in detail.
Why do you want the User-Agent header to differ from the accepted
conventions for all other news/mail headers?
Within news/mail, there is an accepted syntax for <token>s (see any of the
Mime standards) which allows, in particular, for the characters "{" and
"}" to be present.
Why do you want the User-Agent header to differ from the accepted token
syntax of all other news/mail headers?
Within news/mail, the syntax of comments permits CTL characters to be
included, except for NULL and HTAB. Not so in HTTP.
Why do you want the comments in the User-Agent header to differ from the
comments in all other news/mail headers?
Within news/mail, it is not allowed to quote (i.e. precede by "\") the
characters NULL, CR and LF. In HTTP it is.
Why do you want quoted characters in the User-Agent header to differ from
those in all other news/mail headers?
The complete list of differences between the news and HTTP headers, as
listed in our present draft, is as follows:
1. product-token is required and MUST be first,
2. use of arbitrary text in a product-token may require the use of a
quoted-string,
3. comment allows quoted-pair,
4. "{" and "}" are allowed in a value (product-token and
product-version) in Netnews,
5. octets from character sets other than US-ASCII are allowed, but only
within a quoted-string.
6. UTF-8 replaces ISO-8859-1 as charset assumption.
Of these, #3 seems no longer to apply (presumably it did in RFC 1945) so I
have removed it (I have also now changed all references to RFC 1945 to RFC
2616).
I see now that #5 is merely duplicating #2, so I have combined them.
#4 I have already explained above, and #5 you agree to.
So that leaves:
#1 which was part of Greg Berigan's original proposal. I am happy to
listed to reasons why it should be changed.
#2 arises because I changed the syntax from 'token' to 'value'. This is in
general agreement with Mime usage, insofar as 'token's are used in
contexts where they need to be recognised and acted on by parsers, and
'value's are used (usually in the context "token=value") to contain
arbitrary data. The difference is that if any characters not allowable in
tokens are needed, than the whole value has to be quoted. It seemed to me
that the names of products should allow this possibility, since they are
arbitrary data in the sense that they do not have to be recognised by
parsers, and they may be written in languages which require more that
ASCII. Again, I am happy to listen to reasons why this should be changed.
I also share concerns that this whole section is too long, and I have
already invited suggestions for shortening it, consistent with leaving its
technical content unchanged (if we want to change technical content, then
we do that separately).
-- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Email: chl@clw.cs.man.ac.uk Web: http://www.cs.man.ac.uk/~chl Voice/Fax: +44 161 437 4506 Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5