[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Guidelines -04
Hello Makoto,
At 22:14 02/06/05 +0900, MURATA Makoto wrote:
On Wed, 05 Jun 2002 11:18:31 +0900
Martin Duerst <duerst@xxxxxx> wrote:
> - The IETF has a clear preference for UTF-8 over UTF-16. UTF-8 is
> core to RFC 2277, and is a draft standard (and on it's way to
> an IETF standard). UTF-16 is only an informational RFC.
But the RFC for UTF-8 (RFC 2279) is obsolete
Sorry, but RFC 2279 is in no way obsolete. It is an IETF draft
standard. It is no longer referenced from the XML spec, but
that doesn't change it's status.
and is in conflict with
Unicode 3.2 (as shown by the errata E27 for the second edition of XML).
That erratum refers to Unicode 3.1, with some additions that
make it in effect equivalent to referring to Unicode 3.2.
Whether RFC 2279 is in conflict with Unicode 3.2 is an
interesting question. RFC 2279 very clearly points out
the security problems with 'variants' of UTF-8 that were
finally outlawed in Unicode 3.2, so it is easily possible
to claim that there is no conflict at all, that RFC 2279
had it right from the beginning. But things could be
expressed more clearly, and that's why Francois is
working on an update of RFC 2279.
Moreover, is the Unicode signature for UTF-8 allowed or disallowed? I
do not know if any consensus has been reached yet (enlighten me, if I'm
wrong)
This is a very good question. For XML, this question has been
answered positively, at http://www.w3.org/XML/xml-V10-2e-errata#E22,
although some older parsers may not grok it.
For both Unicode and ISO 10646, the encoding signature is also
explicitly allowed for UTF-8. For the IETF, RFC 2279 currently doesn't
say anything (simply because at the time it was written, nobody
thought about using an encoding signature for UTF-8). I hope that
the update will say something like: tolerated, but not recommended
at all.
I still do not believe that "UTF-8 only" provides a reliable basis.
If you think using UTF-8 is not reliably interoperable, then
asking for 'either UTF-8 or UTF-16' won't increase interoperability.
Regards, Martin.