Re: Non-UTF8 8-bit chars

New Message Reply About this list Date view Thread view Subject view Author view

From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Tue Feb 01 2000 - 07:53:29 CST


In <7XsQXM8Xw-B@khms.westfalen.de> kaih@khms.westfalen.de (Kai Henningsen) writes:

>No bytes 0xFF and 0xFE are ever legal in UTF-8.
>No bytes in the range 0xF5 to 0xFD are ever legal in the set reachable
>with Unicode, and the relevant committees have promised to never assign
>codes outside that area (0x00000000 - 0x0010FFFF).

Yes, my syntax was wider than that permitted by UTF-8. I have now changed
it to:

UTF8-xtra-head = %d192-253
UTF8-xtra-tail = %d128-191
UTF8-xtra-char = UTF8-xtra-head 1*UTF8-xtra-tail

and the NOTE now says

NOTE: There are a some sequences of octets which cannot legitimately occur
in UTF-8. These SHOULD NOT be generated by posting agents but, where they
occur indavertently, they SHOULD be passed on untouched by other agents.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Email:     chl@clw.cs.man.ac.uk  Web:   http://www.cs.man.ac.uk/~chl
Voice/Fax: +44 161 437 4506      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9     Fingerprint: 73 6D C2 51 93 A0 01 E7  65 E8 64 7E 14 A4 AB A5


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.