Re: Transformation of Non-ASCII headers

New Message Reply About this list Date view Thread view Subject view Author view

From: J.B. Moreno (planb@newsreaders.com)
Date: Sun Feb 16 2003 - 15:22:08 CST


On 2/15/03 12:35 AM, Forrest J. Cavalier III at <mibsoft@epix.net> wrote:

> "J.B. Moreno" <planb@newsreaders.com> wrote:

>> If you think that's changed since then, you could ask him to rerun his test,
>> but I, for one, am satisfied that UTF8 can be reliably inferred.

> USEFOR already had this discussion. The problem with quoting
> those numbers is that they are not random failure rates.

Of course not -- how could they be? For instance, no US-ASCII Subjects were
falsely detected to contain 8 bit UTF-8 chars.

> For certain messages between senders and recipients, the failure rate
> would be 100%.

Sure, send a message that is a false positive, and it'll be 100% likely to
be a false positive.

But even on a hierarchial basis the failure rate was good:

pl 3/6682 =.044% (he wasn't sure these were actually false)
wi 5/3441 =.145% (all replies to the same message, apparently spam)
tw 2/58844 =.003%
cn 5/5661 =.088%

It never even topped .2 percent (even ignoring the fact that it was
apparently a mal-formed spam or mis-posted binary).

(BTW -- i checked out tw, and big 5 seems to be the charset, and unless the
groups I looked at were non-representative, tagging the charset in either
the headers or the body is strictly optional; i.e. not the case in the
majority of posts).

> The standards must provide a way for those recipients
> to figure out which piece is not standards compliant, and fix it.

Bull. It is users and implementors that detect and fix failures, not
standards -- standards simply define what /is/ a failure.

-- 
J.B. Moreno


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.