From: Bruce Lilly (blilly@erols.com)
Date: Fri Feb 21 2003 - 20:08:29 CST
J.B. Moreno wrote:
> On 2/18/03 2:45 AM, Mark Crispin at <mrc@cac.washington.edu> wrote:
>
>
>>That's correct, and that pretty much shoots down the use of a test to
>>determine if something is UTF-8. The test can prove that text is not
>>UTF-8 (assuming that the UTF-8 wasn't somehow damaged in transit), but it
>>does not reliably prove that text is UTF-8.
>
>
> No, it doesn't.
So we're all agreed; the test doesn't reliably prove that text is utf-8.
<cooked numbers snipped>
> That's the situation as it stands today. But we hope to change A2 by
> encouraging people to switch over from the "local" charset to UTF8, *if*
> that happens then it *will* help because there will actually *be* articles
> identified as UTF8.
There are some now.
> Since the percentage of "incorrectly identified as
> UTF8" will NOT go up, but instead go down
No, about half of those identified as "utf-8" will in fact not be
utf-8, over a wide range.
> If what we do does NOT encourage people to switch (or let us say doesn't
> encourage even 1% of the people to switch), then we're no worse off than we
> are today -- we have a standard that says do X and it is ignored.
Not true; today we do *not* have a standard that says send raw utf-8 --
in fact the standard currently prohibits that.
> Basically, only today does it not make sense to do the test, any shift
> towards it actually being used results in it being a good test.
Not "any shift", only a complete 108 degree about-face (i.e. from very
little utf-8 to exclusively utf-8) -- and if that were to happen a test
would be unnecessary.