[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Additional syntactic restrictions



On Wednesday, June 19, 2002, 3:21:09 AM, Martin wrote:


MD> At 20:20 02/06/17 -0700, Tim Bray wrote:

>>Martin Duerst wrote:
>>
>>>If a protocol restricts itself to UTF-8, then it's not the parser,
>>>but the application, that must enforce the restriction.
>>
>>Which is actually nontrivial and there's no standardized way to do it if 
>>you're using a standard XML processor.  I believe you can tell expat that 
>>it has to try to use a particular encoding and catch the error condition 
>>when this doesn't work, but it's going to be very difficult to distinguish 
>>between an instance that is in a forbidden encoding from one that actually 
>>has broken syntax. -Tim

MD> Well, yes, but: Assume a protocol is defined as accepting only UTF-8
MD> and UTF-16 (I understand that that's what you and Chris would prefer).
MD> There may be some XML parsers that understand exactly these two and
MD> nothing else, but your average XML parser understands more character
MD> encodings, starting with iso-8859-1. And as you say above, there is
MD> no standard way to enforce the restriction to UTF-8 and UTF-16,

Actually there is - if anything else is used, the xml encoding
declaration must be used. (well, unless unwisely overridden by snme
other protocol).

MD> and
MD> you may be able to tell a parser, but then you can't distinguish
MD> between a forbidden encoding and broken syntax.

Its as simple as looking for the encoding pseudo-attribute on the xml
declaration.

MD> So whether a protocol says 'UTF-8 only' or 'only UTF-8 and UTF-16',
MD> it's all just the same.

Not at all.


-- 
 Chris                            mailto:chris@xxxxxx