[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Additional syntactic restrictions
On Wednesday, June 19, 2002, 3:21:09 AM, Martin wrote:
MD> At 20:20 02/06/17 -0700, Tim Bray wrote:
>>Martin Duerst wrote:
>>
>>>If a protocol restricts itself to UTF-8, then it's not the parser,
>>>but the application, that must enforce the restriction.
>>
>>Which is actually nontrivial and there's no standardized way to do it if
>>you're using a standard XML processor. I believe you can tell expat that
>>it has to try to use a particular encoding and catch the error condition
>>when this doesn't work, but it's going to be very difficult to distinguish
>>between an instance that is in a forbidden encoding from one that actually
>>has broken syntax. -Tim
MD> Well, yes, but: Assume a protocol is defined as accepting only UTF-8
MD> and UTF-16 (I understand that that's what you and Chris would prefer).
MD> There may be some XML parsers that understand exactly these two and
MD> nothing else, but your average XML parser understands more character
MD> encodings, starting with iso-8859-1. And as you say above, there is
MD> no standard way to enforce the restriction to UTF-8 and UTF-16,
Actually there is - if anything else is used, the xml encoding
declaration must be used. (well, unless unwisely overridden by snme
other protocol).
MD> and
MD> you may be able to tell a parser, but then you can't distinguish
MD> between a forbidden encoding and broken syntax.
Its as simple as looking for the encoding pseudo-attribute on the xml
declaration.
MD> So whether a protocol says 'UTF-8 only' or 'only UTF-8 and UTF-16',
MD> it's all just the same.
Not at all.
--
Chris mailto:chris@xxxxxx