From: J.B. Moreno (planb@newsreaders.com)
Date: Sun Feb 16 2003 - 15:22:08 CST
On 2/15/03 12:35 AM, Forrest J. Cavalier III at <mibsoft@epix.net> wrote:
> "J.B. Moreno" <planb@newsreaders.com> wrote:
>> If you think that's changed since then, you could ask him to rerun his test,
>> but I, for one, am satisfied that UTF8 can be reliably inferred.
> USEFOR already had this discussion. The problem with quoting
> those numbers is that they are not random failure rates.
Of course not -- how could they be? For instance, no US-ASCII Subjects were
falsely detected to contain 8 bit UTF-8 chars.
> For certain messages between senders and recipients, the failure rate
> would be 100%.
Sure, send a message that is a false positive, and it'll be 100% likely to
be a false positive.
But even on a hierarchial basis the failure rate was good:
pl 3/6682 =.044% (he wasn't sure these were actually false)
wi 5/3441 =.145% (all replies to the same message, apparently spam)
tw 2/58844 =.003%
cn 5/5661 =.088%
It never even topped .2 percent (even ignoring the fact that it was
apparently a mal-formed spam or mis-posted binary).
(BTW -- i checked out tw, and big 5 seems to be the charset, and unless the
groups I looked at were non-representative, tagging the charset in either
the headers or the body is strictly optional; i.e. not the case in the
majority of posts).
> The standards must provide a way for those recipients
> to figure out which piece is not standards compliant, and fix it.
Bull. It is users and implementors that detect and fix failures, not
standards -- standards simply define what /is/ a failure.
-- J.B. Moreno