[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Guidelines -04
Graham Klyne wrote:
At 11:08 AM 6/4/02 -0700, Tim Bray wrote:
In particular, since protocols are going to be read by an XML
processor, and since an XML processor is going to have to be able to
read UTF-8 and UTF-16, the requirement to handle only one of these two
actually imposes extra work
...
I think this is an area where XML for protocol use is different from XML
for documents. I can really imagine that some protocol parsers will be
hand-coded.
I understand this line of argument but I don't believe it. Parsing XML,
despite our best efforts, is actually kind of tricky - hard enough that
it's a challenge even with for example the perl regexp engine. You
have to deal with constructs like
<a
><!-- ouch --><foo:bar
a='
a"<x>"'
' b:baz
=
"ble
tch:"' /></a
>
Given that the best XML processors take care of all this and fit in tens
of K and run at I/O speeds and have voluminous public test suites and
are free.... er, isn't it bad engineering to allow rules that will *add*
work for implementors who do it the standard way with the standard
known-to-be-correct tools, in order to make life easier for people who
want to engage in what is a highly questionable engineering practice?
On top of which, the saving in effort due to not having to handle UTF-16
hardly seems significant (hint: convert it to UTF-8 on input). -Tim