From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Tue Dec 10 2002 - 08:37:20 CST
In <3DF4C4AE.4000903@certplus.com> Jean-Marc Desperrier <jean-marc.desperrier@certplus.com> writes:
>Jean-Marc Desperrier a dit :
>> No character will ever be mapped to the 5 and 6 byte form,
>> (they do not exist anymore in the UNICODE standard).
It is my belief that those characters which in UTF-16 require the use of
surrogates (private use, Egyptian hieroglyphics, whatever) occupy code
points using at least 5 bytes. If you want to express one of those in
UTF-8, you start with the 5 bytes, NOT with the UTF-16 form.
And if that is not so, why does the Yergeau draft (and the RFC which it
replaces) still include all those possibilities? It is not our business to
second guess what other standards are doing. Let us just stick to what
they say exactly.
Agreed it is unlikely that these long forms will ever occur on Usenet, if
the mechanism to express them is there, we ought not to be the ones
preventing it.
>And according to the following text, they do not exist anymore in
>ISO-10646 either :
>http://www.unicode.org/unicode/reports/tr28/#relation
OK, I shall have a look at that. But if it is really so, then surely the
Yergeau draft should change.
-- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5