Re: Last Last Last Call

New Message Reply About this list Date view Thread view Subject view Author view

From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Fri Jul 26 2002 - 05:40:24 CDT


In <3D3FE304.1090108@certplus.com> Jean-Marc Desperrier <jean-marc.desperrier@certplus.com> writes:

>Charles Lindsey a dit :

>>+ attempt to interpet the header according to whatever other character
>>+ set can be deduced, or has been configued as a default by the reader.
>>
>configured.

>>! NOTE: It is possible to determine, with a high degree of
>>! accuracy, when a given text containing octets with the 8th bit
>>! set was not encoded using UTF-8, and using this test to recover
>>! such non-compliant texts is therefore commended where no other
>>! harm could arise.
>>
>Detection that the texte was not encoded as UTF-8 has 100% accuracy.

No, I think you have got it the wrong way around. If a text is correctly
encoded as UTF-8, then it will be compliant with the UTF-8 spec, and hence
the test will report it as valid UTF-8 100% of the time.

But if a text is encoded in something else (big5, for example) then there
is still a small probability that the test will report it as valid UTF-8,
but a much larger probability that it will be reported as not UTF-8.

Which is exactly what my NOTE says.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.