[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Guidelines -04




Graham Klyne wrote:
At 11:08 AM 6/4/02 -0700, Tim Bray wrote:

In particular, since protocols are going to be read by an XML processor, and since an XML processor is going to have to be able to read UTF-8 and UTF-16, the requirement to handle only one of these two actually imposes extra work
...
I think this is an area where XML for protocol use is different from XML for documents. I can really imagine that some protocol parsers will be hand-coded.

I understand this line of argument but I don't believe it. Parsing XML, despite our best efforts, is actually kind of tricky - hard enough that it's a challenge even with for example the perl regexp engine. You
have to deal with constructs like


<a
><!-- ouch --><foo:bar
 a='
 a"&lt;x>"&#27;&#x27;
' b:baz
 =
 "ble
tch:"' /></a
>

Given that the best XML processors take care of all this and fit in tens of K and run at I/O speeds and have voluminous public test suites and are free.... er, isn't it bad engineering to allow rules that will *add* work for implementors who do it the standard way with the standard known-to-be-correct tools, in order to make life easier for people who want to engage in what is a highly questionable engineering practice? On top of which, the saving in effort due to not having to handle UTF-16 hardly seems significant (hint: convert it to UTF-8 on input). -Tim