[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Additional syntactic restrictions
At 20:20 02/06/17 -0700, Tim Bray wrote:
Martin Duerst wrote:
If a protocol restricts itself to UTF-8, then it's not the parser,
but the application, that must enforce the restriction.
Which is actually nontrivial and there's no standardized way to do it if
you're using a standard XML processor. I believe you can tell expat that
it has to try to use a particular encoding and catch the error condition
when this doesn't work, but it's going to be very difficult to distinguish
between an instance that is in a forbidden encoding from one that actually
has broken syntax. -Tim
Well, yes, but: Assume a protocol is defined as accepting only UTF-8
and UTF-16 (I understand that that's what you and Chris would prefer).
There may be some XML parsers that understand exactly these two and
nothing else, but your average XML parser understands more character
encodings, starting with iso-8859-1. And as you say above, there is
no standard way to enforce the restriction to UTF-8 and UTF-16, and
you may be able to tell a parser, but then you can't distinguish
between a forbidden encoding and broken syntax.
So whether a protocol says 'UTF-8 only' or 'only UTF-8 and UTF-16',
it's all just the same.
Regards, Martin.