[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Guidelines -04
At 11:08 AM 6/4/02 -0700, Tim Bray wrote:
In particular, since protocols are going to be read by an XML processor,
and since an XML processor is going to have to be able to read UTF-8 and
UTF-16, the requirement to handle only one of these two actually imposes
extra work - and it's actually hard to see where in the protocol chain
you'd efficiently do that work. Presumably the easy way to design a
protocol is to feed the bits on the wire to an XML processor and deal with
it through SAX or DOM or CLR or some such; are you going to put a filter
in front of the processor to check the char encoding? Or are you going to
ask the processor what encoding it was in so that you can toss it (after
it's been successfully parsed) because you don't like the encding? This
seems like a really egregious violation of "being liberal in what you
accept". Note that popular XML parsers, e.g. expat, give the programmer
UTF-8 anyhow regardless of how the input showed up.
I think this is an area where XML for protocol use is different from XML
for documents. I can really imagine that some protocol parsers will be
hand-coded. In many cases, XML-based protocols may be very simple. I
think the key issue in following the recommendation is that protocol
generators may be specified to generate only UTF-8. If a protocol receiver
uses a generic XML parser, then I agree that it's probably pointless to go
out of your way to filter out UTF-16, but I don't think that's really
what's being suggested.
#g
-------------------
Graham Klyne
<GK@xxxxxxxxxxxxxx>