From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Fri Dec 06 2002 - 10:45:56 CST
The latest UTF-8 draft draft-yergeau-rfc2279bis-02.txt includes a syntax
for UTF-8. I have changed our syntax in 2.4.2 to use the same rules.
This makes no technical difference to our draft.
Original syntax:
UTF8-xtra-2-head= %xC2-DF
UTF8-xtra-3-head= %xE0 %xA0-BF / %xE1-EC %x80-BF /
%xED %x80-9F / %xEE-EF %x80-BF
UTF8-xtra-4-head= %xF0 %x90-BF / %xF1-F7 %x80-BF
UTF8-xtra-5-head= %xF8 %x88-BF / %xF9-FB %x80-BF
UTF8-xtra-6-head= %xFC %x84-BF / %xFD %x80-BF
UTF8-xtra-tail = %x80-BF
UTF8-xtra-char = UTF8-xtra-2-head 1( UTF8-xtra-tail ) /
UTF8-xtra-3-head 1( UTF8-xtra-tail ) /
UTF8-xtra-4-head 2( UTF8-xtra-tail ) /
UTF8-xtra-5-head 3( UTF8-xtra-tail ) /
UTF8-xtra-6-head 4( UTF8-xtra-tail )
Revised syntax:
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2(UTF8-tail) /
%xED %x80-9F UTF8-tail / %xEE-EF 2(UTF8-tail)
UTF8-4 = %xF0 %x90-BF 2(UTF8-tail) / %xF1-F7 3(UTF8-tail)
UTF8-5 = %xF8 %x88-BF 3(UTF8-tail) / %xF9-FB 4(UTF8-tail)
UTF8-6 = %xFC %x84-BF 4(UTF8-tail) / %xFD 5(UTF8-tail)
UTF8-tail = %x80-BF
UTF8-xtra-char = UTF8-2 / UTF8-3 / UTF8-4 / UTF8-5 / UTF8-6
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5