From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Tue Feb 01 2000 - 07:53:29 CST
In <7XsQXM8Xw-B@khms.westfalen.de> kaih@khms.westfalen.de (Kai Henningsen) writes:
>No bytes 0xFF and 0xFE are ever legal in UTF-8.
>No bytes in the range 0xF5 to 0xFD are ever legal in the set reachable
>with Unicode, and the relevant committees have promised to never assign
>codes outside that area (0x00000000 - 0x0010FFFF).
Yes, my syntax was wider than that permitted by UTF-8. I have now changed
it to:
UTF8-xtra-head = %d192-253
UTF8-xtra-tail = %d128-191
UTF8-xtra-char = UTF8-xtra-head 1*UTF8-xtra-tail
and the NOTE now says
NOTE: There are a some sequences of octets which cannot legitimately occur
in UTF-8. These SHOULD NOT be generated by posting agents but, where they
occur indavertently, they SHOULD be passed on untouched by other agents.
-- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Email: chl@clw.cs.man.ac.uk Web: http://www.cs.man.ac.uk/~chl Voice/Fax: +44 161 437 4506 Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5