Re: Transformation of Non-ASCII headers

New Message Reply About this list Date view Thread view Subject view Author view

From: Bruce Lilly (blilly@erols.com)
Date: Fri Feb 21 2003 - 20:08:29 CST


J.B. Moreno wrote:
> On 2/18/03 2:45 AM, Mark Crispin at <mrc@cac.washington.edu> wrote:
>
>
>>That's correct, and that pretty much shoots down the use of a test to
>>determine if something is UTF-8. The test can prove that text is not
>>UTF-8 (assuming that the UTF-8 wasn't somehow damaged in transit), but it
>>does not reliably prove that text is UTF-8.
>
>
> No, it doesn't.

So we're all agreed; the test doesn't reliably prove that text is utf-8.

<cooked numbers snipped>

> That's the situation as it stands today. But we hope to change A2 by
> encouraging people to switch over from the "local" charset to UTF8, *if*
> that happens then it *will* help because there will actually *be* articles
> identified as UTF8.

There are some now.

> Since the percentage of "incorrectly identified as
> UTF8" will NOT go up, but instead go down

No, about half of those identified as "utf-8" will in fact not be
utf-8, over a wide range.

> If what we do does NOT encourage people to switch (or let us say doesn't
> encourage even 1% of the people to switch), then we're no worse off than we
> are today -- we have a standard that says do X and it is ignored.

Not true; today we do *not* have a standard that says send raw utf-8 --
in fact the standard currently prohibits that.

> Basically, only today does it not make sense to do the test, any shift
> towards it actually being used results in it being a good test.

Not "any shift", only a complete 108 degree about-face (i.e. from very
little utf-8 to exclusively utf-8) -- and if that were to happen a test
would be unnecessary.


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.