[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Additional syntactic restrictions




At 12:43 02/06/18 +0200, Chris Lilley wrote:


On Tuesday, June 18, 2002, 5:20:32 AM, Tim wrote:


TB> Martin Duerst wrote:


>> If a protocol restricts itself to UTF-8, then it's not the parser,
>> but the application, that must enforce the restriction.

TB> Which is actually nontrivial and there's no standardized way to do it if
TB> you're using a standard XML processor.  I believe you can tell expat
TB> that it has to try to use a particular encoding and catch the error
TB> condition when this doesn't work, but it's going to be very difficult to
TB> distinguish between an instance that is in a forbidden encoding from one
TB> that actually has broken syntax. -Tim

Ah but in the spirit of being lax in what one accepts, it would be
possible to implement a pre-filter consisting of expat parsing the
document then re-serialising it, as UTF-8, and then sending it to the
special protocol parser confident that it will work.

(That was a joke, he added hurriedly before someone thought it was a good idea).

Just for the record, and to show that it's really not a good idea:


Expat currently only supports UTF-8, UTF-16, US-ASCII, and ISO-8859-1
(see http://www.jclark.com/xml/expatfaq.html).

Regards, Martin.