[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-16, the BOM, and media types



At 04:34 PM 3/22/00 -0500, John Cowan wrote:
>> Section 4.3.3 of XML 1.0 says
>>  "Entities encoded in UTF-16 must begin with the Byte Order Mark described
>>   by ISO/IEC 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK
>>   SPACE character, #xFEFF)."
>
>That describes entities encoded in the charset called "UTF-16".  It says
>nothing about entities encoded in the charsets "UTF-16BE" and "UTF-16LE"
>or for that matter charset "x-focs".

Yep, if you hold your head at just the right angle, and don't think of
the word "rhinocerous", you can convince yourself that the 16[BL]E 
encodings are really different things entirely, just happen to share
a few characters with That Other Encoding's name, just close personal
friends, etc...

But I just don't get it.  This feels perverse.  I repeat, anyone using
any flavor of UTF-16 can (and usually does) put in a BOM, if only on a
belt-and-suspenders basis; and for this reason, de facto, the 16[BL]E
media types, which forbid this practice, are simply not in practical terms 
usable for XML.

I'm not denying that these things exist.  Just asserting that the RFC is
correct in saying they don't work with XML. -Tim