[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: well-formedness error




Paul Hoffman / IMC wrote:



At 10:27 AM -0700 6/18/04, Tim Bray wrote:


You know, we could specify that Atom MUST always be encoded in UTF-8 and/or that the root element must be <Atøm>. Then, we'd have belt-and-suspenders safety in the face of the most deranged encoding breakage. No, that's probably not a serious suggestion. -Tim


Disclaimer: I'm an interop person, not a developer.

Given the number of edge cases this thread has brought out (many of which have bitten developers over the years), I think that mandating UTF-8 is a reasonable serious suggestion. It would eliminate all the edge conditions by saying "if you create something using other than UTF-8, I assure you I will not be able to figure it out". That should cause folks on the creation side to fall into place quickly.

How about modifying the proposal to say "either UTF-8 or UTF-16"?


Reasons:

1) AFAIK, for many languages, UTF-16 will be more efficient than UTF-8

2) XML parsers are REQUIRED to support just these two encodings -- so anybody serving XML content in some other encoding is already risking that the recipient will not be able to decode it.

Best regards, Julian


-- <green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760