From: J.B. Moreno (planb@newsreaders.com)
Date: Fri Feb 21 2003 - 23:14:33 CST
On 2/21/03 5:11 PM, Bruce Lilly at <blilly@erols.com> wrote:
> Charles Lindsey wrote:
>
-snip-
>> Now it appears that, out of every 152,000 (or thereabouts) cases where it
>> ought to report "no", it actually reports "yes" about 18 times. In other
>> words, when it is _supposed_ to report "no", it falsely reports "yes"
>> 0.012% of the time. Those cases are called the "false positives".
>
> No, out of the cases which match possible utf-8 sequences, the
> assumption that the charsets *is* utf-8 is wrong approximately
> half of the time. Those are the false positives. The ones which
> are known a priori not to be utf-8 (both 7-bit-only and sequences
> which cannot be utf-8) are irrelevant.
Those 152,000 messages actually exist, and are moved about usenet every
couple of days. SMTP and IMAP currently takes those 152,000 messages and
turns them into garbage, despite the fact that if they hadn't there's an
excellent chance that they would have been perfectly understandable to the
recipients.
We are suggesting a test that makes an exception for just 43 of those
152,000 messages, and you think it's significant that some of those 43
messages *may* end up being treated as garbage instead of *being* garbage
(because SMTP and IMAP destroyed them)?
-- J.B. Moreno