[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: charset use of UTF-16 vs. utf-8
On Wednesday, June 5, 2002, 6:52:35 PM, Larry wrote:
LM> (Please set the "subject" when you raise new topics)
LM> Folks, can we get a little perspective in this discussion?
LM> Some applications of XML as a data representation mechanism
LM> within protocols are for data streams that have little or
LM> no text content at all, where XML is being used for its
LM> extensibility properties and compatibility with other XML
LM> components.
Its precisely that compatibility which is the source of the concern here.
LM> In these cases, the entire "language" discussion
LM> is really irrelevant. It's inappropriate to mandate mechanisms
LM> that are not universally applicable.
I totally agree with your last sentence; its the universal
applicability of XML parsers to XML that is at the heart of the issue
here.
LM> Even in cases where protocol elements might contain something
LM> understood as 'text', the text itself might have restrictions
LM> from other parts of the protocol. For example, in the context
LM> of a protocol that only allows UTF-8, mandating that the XML
LM> data streams must also be allowed to be UTF-16 would be inappropriate.
Why would the protocol allow only UTF-8?
I can see two cases
a) the protocol does not use XML. Irrelevant to this discussion, and
it can mandate what it wants.
b) the protocol uses XML. If it mandates UTF-8 only, then the XML
parser has to be specially modified to be non-compliant to the XML
specification so that it can, specially, throw a well-formedness error
if the content is in UTF-16.
I'm not seeing the use case whereby the IETF would see a mandated
deviation from the XML specification as a good thing.
--
Chris mailto:chris@xxxxxx