[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: well-formedness error
Paul Hoffman / IMC wrote:
At 10:27 AM -0700 6/18/04, Tim Bray wrote:
You know, we could specify that Atom MUST always be encoded in UTF-8
and/or that the root element must be <Atøm>. Then, we'd have
belt-and-suspenders safety in the face of the most deranged encoding
breakage. No, that's probably not a serious suggestion. -Tim
Disclaimer: I'm an interop person, not a developer.
Given the number of edge cases this thread has brought out (many of
which have bitten developers over the years), I think that mandating
UTF-8 is a reasonable serious suggestion. It would eliminate all the
edge conditions by saying "if you create something using other than
UTF-8, I assure you I will not be able to figure it out". That should
cause folks on the creation side to fall into place quickly.
How about modifying the proposal to say "either UTF-8 or UTF-16"?
Reasons:
1) AFAIK, for many languages, UTF-16 will be more efficient than UTF-8
2) XML parsers are REQUIRED to support just these two encodings -- so
anybody serving XML content in some other encoding is already risking
that the recipient will not be able to decode it.
Best regards, Julian
--
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760