Re: Transformation of Non-ASCII headers

New Message Reply About this list Date view Thread view Subject view Author view

From: Andrew Gierth (andrew@erlenstar.demon.co.uk)
Date: Thu Feb 20 2003 - 07:31:26 CST


>>>>> "Mark" == Mark Crispin <mrc@CAC.Washington.EDU> writes:

> On Sun, 16 Feb 2003, Bruce Lilly wrote:
>> No, no, no; the rate of false positives (again assuming
>> that one knows the real charset) is the ratio of the
>> false matches to the total matching the utf-8 rule, or
>> 17 / 26 which is greater than 65%.

 Mark> That's correct,

on the contrary, it is statistical nonsense.

The sample contained 151,991 strings of non-UTF-8 text containing
8-bit characters, and when these were fed to the "is this UTF-8"
algorithm it incorrectly answered "yes" in 17 cases. (taking the
original figures for the time being, in fact the real error rate was
lower).

The number of _correct_ "yes" answers is dependent only on the
composition of the sample and not the error rate of the algorithm.

-- 
Andrew.


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.