From: J.B. Moreno (planb@newsreaders.com)
Date: Sat Feb 22 2003 - 08:58:48 CST
On 2/22/03 1:49 AM, Bruce Lilly at <blilly@erols.com> wrote:
> Andrew Gierth wrote:
>
>> Failing to pay attention when your statistical error is explained to
>> you is a good sign that you're not really interested in the truth, and
>> you only want to extract figures that support your preselected
>> position.
>
> I have in fact paid quite close attention; I simply disagree.
>
>> Bruce> the ratio of false positives is due to the fact that coded
>> Bruce> utf-8 generates octet sequences which are not markedly
>> Bruce> different from other 8-bit charsets, especially on short
>> Bruce> texts.
>
> Detailed in another message recently posted; summary one expects
> about a 50% percent error rate with iso-8859-x in the mix; a bit
> more with some other charsets as well.
Everything in use on usenet (which definitely includes iso-8859-x), was part
of "the mix", and as you've been told before, the test *was* done on data
with short text.
-- J.B. Moreno