From: Clive D.W. Feather (clive@demon.net)
Date: Tue Jun 04 2002 - 13:18:31 CDT
Marc Mutz said:
> AFAIK, UTF-8 will not encode the surrogate pairs as two 16bit
> characters, but the _value_ of the surrogates, thus already leaving the
> 16bit target space.
Consider U+45678. This is encoded as:
UTF-32: 0x00045678
UTF-16: 0xD8D5 0xDE78 (0x45678 = 0x10000 + 0x0D5 * 1024 + 0x278)
UTF-8: 0xF1 0x85 0x99 0xB8
-- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8371 1138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | | NOTE: fax number change