Re: Subject header statistics

New Message Reply About this list Date view Thread view Subject view Author view

From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Wed Jun 26 2002 - 05:21:37 CDT


In <200206260211.g5Q2Bn3f029182@lima.epix.net> "Forrest J. Cavalier III" <mibsoft@epix.net> writes:

>> I don't know how to be clearer than this: if all 49 matches had been false
>> positives, then that fact *still* wouldn't be relevant. The ratio of false
>> positives to non-utf8 8 bits is relevant, the ratio of false positives to
>> correct positives is not.
>>

>Lost me there. I believe the question was "If user agents start
>using Subject headers as UTF-8 if they contain valid UTF-8 encoding
>sequences, how likely is it that they are wrong?"

No, the question is "If user agents start assuming that Subject headers
which fail the UTF-8 test are in some "local"/"guessed" character set, how
many times will they treat some header as UTF-8 when it isn't?"

The recent measurement showed 4 in 90,000.

>The recent measurement was 4 in 49.

Nope.

Note also that, for the question "How many times will they fail to treat
some header as UTF-8 when it is?" is "never".

There is also the small matter of how agents do their "guessing". If they
regularly read only groups in which some particular character set abounds,
then that will be their guess. If not, then they are stuck.

>Did I miss the question?

Yes.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.