Re: Unicode and draft 07

New Message Reply About this list Date view Thread view Subject view Author view

From: Clive D.W. Feather (clive@demon.net)
Date: Tue Jun 04 2002 - 13:18:31 CDT


Marc Mutz said:
> AFAIK, UTF-8 will not encode the surrogate pairs as two 16bit
> characters, but the _value_ of the surrogates, thus already leaving the
> 16bit target space.

Consider U+45678. This is encoded as:

UTF-32: 0x00045678
UTF-16: 0xD8D5 0xDE78 (0x45678 = 0x10000 + 0x0D5 * 1024 + 0x278)
UTF-8: 0xF1 0x85 0x99 0xB8

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:  +44 20 8371 1138
Internet Expert     | Home:  <clive@davros.org>  | Fax:  +44 870 051 9937
Demon Internet      | WWW: http://www.davros.org | Mobile: +44 7973 377646
Thus plc            |                            | NOTE: fax number change


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.