[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: msg-id



On Wed 15 Dec 2004 at 08:23:00AM -0500, in <200412150823.01136.blilly@xxxxxxxxx>,
Bruce Lilly <blilly@xxxxxxxxx> wrote:
> On Mon December 13 2004 19:09, Thorfinn wrote:
> > We mean MUST, and we also accept that MUST applies in the universe of
> > practicalities, where nothing is actually of zero probability, so "close
> > enough" is good enough.
> RFC 2119 keywords, especially MUST/MUST NOT, are applicable
> in specific types of circumstances as detailed in 2119 (q.v.).

Yes.  They are.

> Disappointment of a purchaser of second-hand goods is not one
> of the criteria.

Are you talking about the second hand domain name case?  Anyway,
more response further down, after the RFC quoting.

> > You don't accept that, that's fine.  Everyone else, AFAICT, does accept
> > that.  If anyone else doesn't, speak up now.
> I object to such an ultimatum from other than the chair.
> There is merit to some of what John has to say,

Indeed there sometimes is, but often it's buried amongst so much
extraneous random stuff that it's hard to find the point.  If you have
the time to wade through that to find the gems, then please, feel free.
I, for one, no longer do.

> and there are serious problems with what some others have claimed. Any
> ultimata should come from the chair, not individual WG members, and
> should carry a reasonable time period for response, not demanding an
> instant (i.e. "now") response.

Fair enough.  I won't issue such a comment again.  It was more of a
rhetorical device than an actual ultimatum, but it's *also* true that if
I feel the need to use rhetorical devices in what I'm saying, then I
shouldn't bother replying.

> I haven't the time to go into detail now, but I strongly
> suggest that reading RFC 2822 section 3.6.4, RFC 822
> section 4.1 and section 6, and draft-crocker-email-arch-01
> sections 3.2 and 3.3 may help to clear up some of the
> contradictory remarks made on this topic in this WG.

Apart from all the quoted stuff below, I note that message-id in email
appears to be *optional*, which it certainly shouldn't be for us.

===== begin quoted stuff =====

RFC2119:

1. MUST   This word, or the terms "REQUIRED" or "SHALL", mean that the
   definition is an absolute requirement of the specification.

3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.

RFC2822:
--------

3.6.4 (part of)

The message identifier (msg-id) itself MUST be a globally unique
identifier for a message.  The generator of the message identifier
MUST guarantee that the msg-id is unique.

RFC822:
-------

 4.1.  SYNTAX

     msg-id      =  "<" addr-spec ">"            ; Unique message id

4.6.  REFERENCE FIELDS

     4.6.1.  MESSAGE-ID / RESENT-MESSAGE-ID

             This field contains a unique identifier  (the  local-part
        address  unit)  which  refers to THIS version of THIS message.
        The uniqueness of the message identifier is guaranteed by  the
        host  which  generates  it.  This identifier is intended to be
        machine readable and not necessarily meaningful to humans.   A
        message  identifier pertains to exactly one instantiation of a
        particular message; subsequent revisions to the message should
        each receive new message identifiers.

6.  ADDRESS SPECIFICATION

     6.1.  SYNTAX
     addr-spec   =  local-part "@" domain        ; global address

     local-part  =  word *("." word)             ; uninterpreted
                                                 ; case-preserved

... and a lot more in words in section 6, all superseded by 2822 anyway.

draft-crocker-email-arch-01:
----------------------------

3.2  Domain Names

   A domain name is a global reference to an Internet resource, such as
   a host, a service or a network.  A name usually maps to one or more
   IP Addresses.  A domain name can be administered to refer to
   individual users, but this is not common practice.  The name is
   structure as a hierarchical sequence of sub-names, separated by dots
   (".").

   When not part of a mailbox address, a domain name is used in Internet
   mail to refer to a node that took action upon the message, such as
   providing the administrative scope for a message identifier, or
   performing transfer processing.

3.3  Message Identifers

   Message identifiers have two distinct parts, divided by an at-sign
   ("@").  The right-hand side contains a globally interpreted name for
   the administrative domain assigning the identifier.  The left-hand
   side of the at-sign contains a string that is globally opaque and
   serves to uniquely identify the message within the domain referenced
   on the right-hand side.  The duration of uniqueness for the message
   identifier is undefined.

   The identifier may be assigned by the user or by any component of the
   system along the path.  Although Internet mail standards provide for
   a single identifier, more than one is sometimes assigned.

===== end quoted stuff =====

I note one major item that differs about news and email.  Email messages
rarely get collected into large collections, and it's not, thus, as
simportant for them to have a unique message id.  They aren't even
required to have one.  It's optional.

Usenet ones *do*... that's the whole point, after all.  So everything
*does* have a message-id.  Now, if a message-id *is* generated, that
MUST be unique is sitting there (and has been sitting there for some
time, and isn't about to go away).  That's fine and dandy, and as far as
I can tell, most people are entirely happy with that being a MUST.
Please correct me (in an appropriately timely fashion) if I'm wrong on
that.

Back to the second hand domain case... Firstly, it may not even be
possible for someone obtaining a domain to know whether the domain has
been used previously, as domain names often go back to a registrar
first, and some registrars are reknowned for being rather poor at giving
out information at times.  At least in the .au space they are *required*
to go back to the registrar with no on selling allowed, rather than
being on-sold directly.

That being the case, are you actually suggesting that even with that
problem, a domain name owner must find all the past algorithms that have
been used to generate IDs for that domain, and then design a new
algorithm to avoid colliding with *any* of them, and make sure that any
software installed uses that algorithm?

And even with that circumstance, such collisions could easily happen
even *without* a past owner, since there are several places (and thus
several different algorithms in use) where a message ID could be
generated on the *same* host.  Mail-transfer-agents make them, any one
of a zillion different mail-user-agents will make them on their own, and
existing news servers and news clients generate their own too.

And then we have the problem that a *lot* of user-agents (both mail and
news ones) are operating behind firewalls, and do *not* have actual
valid Internet domain names to use as the RHS, nor do they even have
valid unique Internet IP addresses, due to them generally being in the
"private" address spaces allocated.  And they can't even use ethernet
MAC addresses, because a lot of them are dialup users, and don't *have*
a MAC address...

Anyway, it is *entirely* feasible for those several different
algorithms, even when used on a host with a proper FQDN, to generate
colliding message-ids.  I think that's actually got a significantly a
higher likelihood than the algorithm "use SHA256 of message body and
existing known headers as message-id-lhs" (which incidentally also has
the property that if the message is the same, the message-id-lhs will be
the same).

Are you suggesting that all those algorithms (including the use SHA256
one, which has a theoretical collision possibility immensely smaller
than the "once per thousand years" being bandied about) do not satisfy
the MUST that is in RFC2822, and that people should not use them?

The point is, some of us *do* accept that even a "MUST" *does* bow to
practicalities.  Yes, it *is and should be* an absolute requirement of
the specification that message-id generators make "unique" message-ids.
But no, we're not blind about it, and we make judgements about
probabilities, and certain sufficiently low probability events, to us,
are not worth worrying about.

This is all moot anyway... Our job is to specify, and to leave it up to
the implementors to implement.  Sure, we have to make sure that what we
specify is feasible and reasonable.

So, I guess we come to the meat of it:

I think that a unique message id is a *theoretically impossible* thing
to generate.  I don't care *what* algorithm you use, someone can design
a counter algorithm, which also generates "unique" ids, which collides
with the previous algorithm.  And multiple algorithms *are* in use out
there.

However, I also think that a unique message id is a *theoretically
desireable* thing to have.

Now, whilst we *can* override the mail standards to say that the
uniqueness is now a "SHOULD" rather than a "MUST"... I think that's
something which would convey *entirely* the wrong message to
implementors.

After all, it's *even more important* for us to have a unique message-id
for every message, since we collect who knows how many millions of
usenet messages in the one place all the time, and they're *all*
supposed to have unique message-ids.

Later,

  Thorf

-- 
<a href="http://tertius.net.au/~thorfinn/";>thorfinn@xxxxxxxxxxxxxx</a>
"Everyone knows that bloodstained walls improve productivity."
    -- Thorfinn(tertius.net.au