Re: UTF-8 syntax

New Message Reply About this list Date view Thread view Subject view Author view

From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Tue Dec 10 2002 - 08:37:20 CST


In <3DF4C4AE.4000903@certplus.com> Jean-Marc Desperrier <jean-marc.desperrier@certplus.com> writes:

>Jean-Marc Desperrier a dit :

>> No character will ever be mapped to the 5 and 6 byte form,

>> (they do not exist anymore in the UNICODE standard).

It is my belief that those characters which in UTF-16 require the use of
surrogates (private use, Egyptian hieroglyphics, whatever) occupy code
points using at least 5 bytes. If you want to express one of those in
UTF-8, you start with the 5 bytes, NOT with the UTF-16 form.

And if that is not so, why does the Yergeau draft (and the RFC which it
replaces) still include all those possibilities? It is not our business to
second guess what other standards are doing. Let us just stick to what
they say exactly.

Agreed it is unlikely that these long forms will ever occur on Usenet, if
the mechanism to express them is there, we ought not to be the ones
preventing it.

>And according to the following text, they do not exist anymore in
>ISO-10646 either :

>http://www.unicode.org/unicode/reports/tr28/#relation

OK, I shall have a look at that. But if it is really so, then surely the
Yergeau draft should change.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.