Re: Uniqueness in client-generated Message IDs (was: Re: Author-ID, Thread-Participants, etc (was Re: posted-mailed update))

New Message Reply About this list Date view Thread view Subject view Author view

From: Russ Allbery (rra@stanford.edu)
Date: Thu Jul 02 1998 - 01:06:04 CDT


Greg Berigan <gberigan@cse.unl.edu> writes:

> Let's for the moment assume that the probability is 1 bit in 32 million
> gets flipped by a cosmic ray. To have a number at that level of
> probability you'd need to represent it by 32 million bits.

Um, no. If any bit anywhere in the critical code path for generating the
message ID gets flipped, you may as well throw away the results, as no
algorithm is going to work. Not to mention that we can *significantly*
reduce the collision space by using a fully qualified hostname and a
timestamp, and the only requirement is that the chance of a collision be
lower than the chance of a cosmic ray interfering with your computer's
processing.

So we don't need anywhere near 32 million bits in order to achieve this.

(I really shouldn't do back of the envelope calculations on the fly,
because I'll get them wrong, but what the hell. At least I'll get them
wrong and get someone else to correct me. The best way to get an answer
on Usenet isn't to ask a question, but is to post an incorrect answer....)

Let's suppose that a given ISP has one million customers who will all have
the same hostname on their messages. Now, let's suppose that each one of
those customers generates one thousand messages per day. Assume a
timestamp as part of the message ID that includes the time to the nearest
second. There are 86,400 seconds in a day, making the probability that a
given customer will be sending a message at a given second about 0.012.
There will be, on average, 12,000 customers generating a message ID at the
same time to the limit of accuracy of the clock.

A truly random 32 bit number added to the message ID means that the
probability of a collision each second is about 0.0000027, or a
probability of 0.01 that there will be a collision within an hour's time.

A truly random 64 bit number, according to a quick calculation and
assuming I didn't screw something up (which I probably did -- I'm *so* not
a stats person) reduces that to a 1% chance there will be a collision
sometime in the next 480,000 years. (I'm doing additive probabilities
here, which is a major mistake, and hoping the fudge factor isn't too
large. I'm sure someone will rightfully hit me for this.)

That's for one site. On one hand, there are going to be far more than one
site; on the other hand, any site significantly smaller than one million
people sharing the same RHS is going to be background noise.

Again assuming that these calculations bear any resemblence to reality,
I'd be quite comfortable with 64 bit random values. I'm not sure about 32
bit random values.

Jamie, did you do better calculations when you were writing your draft?

> The problem with some probabilities is that they come back and bite you
> when you try to get too clever with them.

Um, yeah. I'm not particularly fond of yours, for example. :)

> Again, a good random number generator has no qualms whatsoever with
> generating the same number twice in a microsecond, or even the same
> number 10,000 times in a row.

This is a meaningless statement. A good random universe generator has no
qualms whatsoever with destroying the world with a comet twice in a
microsecond. Unless you're going to attach some probabilities to that, it
doesn't help any.

> And there is still no need whatsoever to rely on such things anyway! We
> can achieve uniqueness by only applying hard data provided a shared
> authority (which can have a multitude of forms, not all of which involve
> a human nor constant consultation).

Yes, but that's more complexity. Let's please not add complexity unless
it's needed. You're not convincing me there's a need.

> Random numbers cannot be justified,

If you're going to argue religion, we may as well just quit now. Computer
science has been using random numbers for any number of reasons for many
years.

> After all, what is the DNS if not a global authority? What's good for
> the RHS should be good for the LHS.

Er, no. The whole point of *having* an RHS is so that we can be less
restrictive about the LHS.

-- 
Russ Allbery (rra@stanford.edu)         <URL:http://www.eyrie.org/~eagle/>


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.