[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Guidelines -04




First off, I want to apologize for coming back *yet again* with suggestions and input. Having said that, I should add that I believe that this is turning into a very significant document indeed. While the title may say "... within IETF Protocols", this is on its way to being IMHO the best single reference I've seen anywhere on The Right Way To Do XML. Once it stabilizes I'll write something for XML.com and the TAG may well say "hooray!" and I'll be activist in spreading the word about it, and it'll quite possibly get slashdotted or otherwise covered. So, I apologize, but I think we're gonna be living with this one a loooooooooong time and it's worth going the extra mile to get right. And 5 drafts isn't *that* many.
======================


In 3. XML Alternatives
==============================

Para beginning "Specification Encoding:..." - since I'm strongly in support of James Clark's stand against giving XML Schemas any special standing, this needs to have more references beyond the existing [11] and [12].

==============================

Para beginning "Text encoding and character sets:..." - it's really bogus to reference 10646 but not Unicode. Here's why:
(a) the character repertoire is identical, but Unicode includes a ton of extra important information about character and string semantics
(b) you can go and easily buy the Unicode spec at a reasonable price; (c) the Unicode spec is a masterful piece of work that is extremely useful to implementors and in fact anyone who's gonna go to the mat with i18n'ed text *should* purchase the Unicode spec; they'll use it
(d) very few people have ever even seen the ISO spec and that's as it should be.


If you really want to leave the 10646 reference in, there's no harm in it, but a Unicode reference is important. Grab the reference from http://www.w3.org/TR/charmod/#sec-RefUnicode

========================

Para beginning "Data Encoding:" typo: "sequence bytes" missing an "of"

======================

In sectin 4.5 "Well-Formedness"

==============================

First para says "structural rules" - er, did you mean "syntactic rules"? I think so.

===========================

2nd para: "attempting to partially interpret non-well-formed instances of an element which is required to be XML." (I may have written this). I think the word "element" is wrong - you mean "instance" I think - it's clearly not OK, if you hit a busted element, to go and try to fish info out of the surrounding elements.

===================================

In section 4.6 "Validity and Extensibility"

===========================

Bullet list of definition of protocols. I'm already on the record in support of James Clark. There are a substantial number of schema facilities for XML, of which 3 are specially distinguished:

(a) DTDs are limited in functionality, have a syntax that's distressing to some, but have official blessing both from W3C and ISO and a body of implementation experience that far surpasses all other schema facilities in the world put together, with a huge variety of high-quality commercial and free software implementations.
(b) XSD has official blessing from W3C and a moderate amount of software available, but little practical experience.
(c) RELAX-NG will soon have official blessing from ISO, a moderate amount of software available, but little practical experience.


So on what basis does the IETF favor XSD? In terms of implementation experience and free software, it loses to DTDs big-time. And yes, if you've got a protocol that's simple enough to define using DTDs, you should damn well use DTDs and skip all this argument (BTW, you can round-trip DTDs to RelaxNG if you want a more-up-to-date schema). In terms of de jure blessing by official bodies (if IETF cares) XSD's standing is effectively equal. In terms of technical quality, I challenge you to find anybody anywhere who will argue that XSD comes close to RNG.

========================================

In section 4.13, "Interaction with the IANA"

========================================

Stylistic nit: in the bullet list, there are a couple of instances of RFC-style "SHOULD" and "SHOULD NOT", which seems to be carefully avoided throughout the rest of the document. Does this worry anybody?

=========================================

In section 5.1 "Character Sets and Encodings"

=========================================

The last para "recommends, for simplicity, that only UTF-8 be allowed." New news: as of yesterday, the W3C TAG disagreed with a (clearly related) recommendation in the W3C Charmod draft that a single encoding be used. See http://lists.w3.org/Archives/Public/www-tag/2002Jun/0020.html on this.

In particular, since protocols are going to be read by an XML processor, and since an XML processor is going to have to be able to read UTF-8 and UTF-16, the requirement to handle only one of these two actually imposes extra work - and it's actually hard to see where in the protocol chain you'd efficiently do that work. Presumably the easy way to design a protocol is to feed the bits on the wire to an XML processor and deal with it through SAX or DOM or CLR or some such; are you going to put a filter in front of the processor to check the char encoding? Or are you going to ask the processor what encoding it was in so that you can toss it (after it's been successfully parsed) because you don't like the encding? This seems like a really egregious violation of "being liberal in what you accept". Note that popular XML parsers, e.g. expat, give the programmer UTF-8 anyhow regardless of how the input showed up.

============================================

Thanks (on behalf of the whole community) for the work you're putting into this. -Tim