[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Guidelines -04




At 11:08 AM 6/4/02 -0700, Tim Bray wrote:
In particular, since protocols are going to be read by an XML processor, and since an XML processor is going to have to be able to read UTF-8 and UTF-16, the requirement to handle only one of these two actually imposes extra work - and it's actually hard to see where in the protocol chain you'd efficiently do that work. Presumably the easy way to design a protocol is to feed the bits on the wire to an XML processor and deal with it through SAX or DOM or CLR or some such; are you going to put a filter in front of the processor to check the char encoding? Or are you going to ask the processor what encoding it was in so that you can toss it (after it's been successfully parsed) because you don't like the encding? This seems like a really egregious violation of "being liberal in what you accept". Note that popular XML parsers, e.g. expat, give the programmer UTF-8 anyhow regardless of how the input showed up.

I think this is an area where XML for protocol use is different from XML for documents. I can really imagine that some protocol parsers will be hand-coded. In many cases, XML-based protocols may be very simple. I think the key issue in following the recommendation is that protocol generators may be specified to generate only UTF-8. If a protocol receiver uses a generic XML parser, then I agree that it's probably pointless to go out of your way to filter out UTF-16, but I don't think that's really what's being suggested.


#g


------------------- Graham Klyne <GK@xxxxxxxxxxxxxx>