From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Mon Jun 03 2002 - 11:30:59 CDT
In <20020531104641.GB16354@finch-staff-1.demon.net> "Clive D.W. Feather" <clive@demon.net> writes:
>Charles Lindsey said:
>>>> NOTE: UTF-8 is an encoding of the 16bit UCS-2 (and even the 32bit
>>>> UCS-4) character sets ...
>>> I really think that's too much detail for our document. We don't care about
>>> UCS-2 or UCS-4 (which aren't Unicode terms anyway).
>> According to RFC2279bis, the terms UCS-2 and UCS-4 are certainly defined
>> in ISO 10646, if not in Unicode.
>Yes, but we don't really talk about Unicode v ISO (and nor should we).
>> Anyway, I now have:
>>
>> NOTE: UTF-8 is an encoding for the [ISO/IEC 10646] character set
>> (in both its 16 and 32 bit forms) with the property that any
>> octet less than 128 immediately represents the corresponding
>> US-ASCII character, thus ensuring upwards compatibility with
>> previous practice. ...
>I can live with that, I suppose.
>Um, ISO or Unicode in the brackets ?
If you read the paragraph immediately preceding the one quoted, you will
see that it sets out the relationship between ISO 10646 and Unicode, so it
does not really matter which one is mentioned here. I chose to mention ISO
10646 because AIUI that still pays lip service to the possibility of 32
bit codes, whereas the Unicode people seem to have gone quite away from them.
Practically speaking, of course, they are dead. But UTF-8 still nominally
covers them (and that includes the latest UTF-8 draft).
-- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5