[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Additional syntactic restrictions




At 14:37 02/06/20 +0200, Chris Lilley wrote:
On Wednesday, June 19, 2002, 3:21:09 AM, Martin wrote:

MD> Well, yes, but: Assume a protocol is defined as accepting only UTF-8
MD> and UTF-16 (I understand that that's what you and Chris would prefer).
MD> There may be some XML parsers that understand exactly these two and
MD> nothing else, but your average XML parser understands more character
MD> encodings, starting with iso-8859-1. And as you say above, there is
MD> no standard way to enforce the restriction to UTF-8 and UTF-16,

Actually there is - if anything else is used, the xml encoding
declaration must be used. (well, unless unwisely overridden by snme
other protocol).

You can always check that before the XML reaches the parser, the same way you can check for 'only UTF-8' before the XML reaches the parser. But I wouldn't call that standard. The parser itself won't tell you whether there is an encoding declaration or not. And even if there is one, you can't exclude the case that it is UTF-8 or UTF-16.


MD> and
MD> you may be able to tell a parser, but then you can't distinguish
MD> between a forbidden encoding and broken syntax.

Its as simple as looking for the encoding pseudo-attribute on the xml
declaration.

MD> So whether a protocol says 'UTF-8 only' or 'only UTF-8 and UTF-16',
MD> it's all just the same.

Not at all.

It's still all the same, even with peaking at the encoding declaration as you suggest.

Regards, Martin.