[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: News and nntp URI schemes
In <41C5AC54.6E20@xxxxxxxxxxxxxxxxx> Frank Ellermann <nobody@xxxxxxxxxxxxxxxxx> writes:
>Charles Lindsey wrote:
>> If you want to comment, please do so to the URI list
>I'll do that separately, here I'm only interested in a
>technical detail relevant for Usefor:
>
>>| <id-left> and <id-right> are defined in Section 3.6.4 of
>>| RFC 2822 [RFC2822]. They MUST be in a canonical form in
>>| which no <quoted-string> or <quoted-pair> is used in a
>>| context where the same semantic meaning could have been
>>| rendered without such quoting; moreover, no whitespace or
>>| ">" may be included, whether %-encoded or not and/or quoted
>>| or not.
>
>>| For example, neither
>>| news:"abcd"@example.com
>>| nor
>>| news:"ab\cd"@example.com
>>| is in canonical form, because the form
>>| news:abcd@xxxxxxxxxxx
>>| is available.
>This text assumes that draft-usefor-02 is already on standards
>track, but that is not the case. The "official" definition is
>still <unique@full_domain_name>, an "unofficial" definition is
>"<" local-part "@" domain ">"
The trouble is that there are just too many definitions of message-id
around the place. There is going to be strong pressure to keep to RFC 2822
or at least to a subset of it (which the RFC 1036 definition is not). I
don't think we should be writing news standards based on the RFC 1036
definition at this stage.
OTOH, if the URL definition is based on RFC 2822, then we have to warn
that certain "non-canonical" ones may cause problems.
The alternative is to give a very loose definition, on the grounds that
URLs are supposed to contain only whatever has been used in some existing
news article (and the agent that generated that article can worry about
which standard it conformed to). So any string of characters with an '@'
in the middle is good enough.
That is more or less what the new NNTP draft says:
Each article MUST have a unique message-id; two articles offered by
an NNTP server MUST NOT have the same message-id. For the purposes
of this specification, message-ids are opaque strings that MUST meet
the following requirements:
o A message-id MUST begin with "<" and end with ">", and MUST NOT
contain the latter except at the end.
o A message-id MUST be between 3 and 250 octets in length.
o A message-id MUST NOT contain octets other than printable US-ASCII
characters.
Two message-ids are the same if and only if they consist of the same
sequence of octets.
.............
This specification states that message-ids are the same if and only
if they consist of the same sequence of octets. Other specifications
may define two different sequences as being equal because they are
putting an interpretation on particular characters. RFC 2822
[RFC2822] has a concept of "quoted" and "escaped" characters. It
therefore considers the three message-ids:
<abcd@xxxxxxxxxxx>
<"abcd"@example.com>
<"ab\cd"@example.com>
as being identical. Therefore an NNTP implementation handing email
articles must ensure that only one of these three appears in the
protocol and the other two are converted to it as and when necessary,
such as when a client checks the results of a NEWNEWS command against
an internal database of message-ids. Note that RFC 1036 [RFC1036]
never treats two different strings as being identical. Its draft
successor restricts the syntax of message-ids so that, whenever RFC
2822 would treat two strings as equivalent, only one of them is valid
(in the above example only the first string is valid).
(And yes, I just noticed that NNTP will not accept Non-WS-Controls, but I
will deal with that elsewhere).
So yes, it might be better for the URL spec. to give a rather vague
definition of message-id, basing it on whatever the news article concerned
had actually used.
Anyway, the uri@xxxxxxx list is the proper place to continue this
discussion.
--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: chl@xxxxxxxxxxxxxxxx Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5