From: Russ Allbery (rra@stanford.edu)
Date: Mon Jun 03 2002 - 20:03:17 CDT
Henry Spencer <henry@spsystems.net> writes:
> I believe the distinction being referred to is not 16 vs 32, but 20 vs
> 32. The (insert sound of retching here) surrogates stretch the Unicode
> code space to (roughly speaking) 20 bits, not 32.
> My understanding is that even the ISO side of the house has now come
> down quite firmly against ever populating any of the 10646 code space
> beyond there, but 10646 does not actually say that out loud.
Yes, per recent discussion on the Unicode list they appear to be trying to
maintain the viability of the UTF-16 encoding (which is limited to about
20 bits) in perpetuity. That decision is predicated on the belief that
more space won't be needed to encode all of the world's characters,
though, so if that turns out to be incorrect for some unexpected reason, I
get the impression that the decision could be reconsidered.
I don't think that's likely to happen in the next decade at the very
least, though.
(Personally, I think that UTF-16 was a bad idea from the get-go, and I
think a lot of the Unicode people actually agree, but unfortunately it was
widely used by early adopters.)
-- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>