[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Guidelines -04
First off, I want to apologize for coming back *yet again* with
suggestions and input. Having said that, I should add that I believe
that this is turning into a very significant document indeed. While the
title may say "... within IETF Protocols", this is on its way to being
IMHO the best single reference I've seen anywhere on The Right Way To Do
XML. Once it stabilizes I'll write something for XML.com and the TAG
may well say "hooray!" and I'll be activist in spreading the word about
it, and it'll quite possibly get slashdotted or otherwise covered. So,
I apologize, but I think we're gonna be living with this one a
loooooooooong time and it's worth going the extra mile to get right.
And 5 drafts isn't *that* many.
======================
In 3. XML Alternatives
==============================
Para beginning "Specification Encoding:..." - since I'm strongly in
support of James Clark's stand against giving XML Schemas any special
standing, this needs to have more references beyond the existing [11]
and [12].
==============================
Para beginning "Text encoding and character sets:..." - it's really
bogus to reference 10646 but not Unicode. Here's why:
(a) the character repertoire is identical, but Unicode includes a ton of
extra important information about character and string semantics
(b) you can go and easily buy the Unicode spec at a reasonable price;
(c) the Unicode spec is a masterful piece of work that is extremely
useful to implementors and in fact anyone who's gonna go to the mat with
i18n'ed text *should* purchase the Unicode spec; they'll use it
(d) very few people have ever even seen the ISO spec and that's as it
should be.
If you really want to leave the 10646 reference in, there's no harm in
it, but a Unicode reference is important. Grab the reference from
http://www.w3.org/TR/charmod/#sec-RefUnicode
========================
Para beginning "Data Encoding:" typo: "sequence bytes" missing an "of"
======================
In sectin 4.5 "Well-Formedness"
==============================
First para says "structural rules" - er, did you mean "syntactic rules"?
I think so.
===========================
2nd para: "attempting to partially interpret non-well-formed instances
of an element which is required to be XML." (I may have written this).
I think the word "element" is wrong - you mean "instance" I think - it's
clearly not OK, if you hit a busted element, to go and try to fish info
out of the surrounding elements.
===================================
In section 4.6 "Validity and Extensibility"
===========================
Bullet list of definition of protocols. I'm already on the record in
support of James Clark. There are a substantial number of schema
facilities for XML, of which 3 are specially distinguished:
(a) DTDs are limited in functionality, have a syntax that's distressing
to some, but have official blessing both from W3C and ISO and a body of
implementation experience that far surpasses all other schema facilities
in the world put together, with a huge variety of high-quality
commercial and free software implementations.
(b) XSD has official blessing from W3C and a moderate amount of software
available, but little practical experience.
(c) RELAX-NG will soon have official blessing from ISO, a moderate
amount of software available, but little practical experience.
So on what basis does the IETF favor XSD? In terms of implementation
experience and free software, it loses to DTDs big-time. And yes, if
you've got a protocol that's simple enough to define using DTDs, you
should damn well use DTDs and skip all this argument (BTW, you can
round-trip DTDs to RelaxNG if you want a more-up-to-date schema). In
terms of de jure blessing by official bodies (if IETF cares) XSD's
standing is effectively equal. In terms of technical quality, I
challenge you to find anybody anywhere who will argue that XSD comes
close to RNG.
========================================
In section 4.13, "Interaction with the IANA"
========================================
Stylistic nit: in the bullet list, there are a couple of instances of
RFC-style "SHOULD" and "SHOULD NOT", which seems to be carefully avoided
throughout the rest of the document. Does this worry anybody?
=========================================
In section 5.1 "Character Sets and Encodings"
=========================================
The last para "recommends, for simplicity, that only UTF-8 be allowed."
New news: as of yesterday, the W3C TAG disagreed with a (clearly
related) recommendation in the W3C Charmod draft that a single encoding
be used. See
http://lists.w3.org/Archives/Public/www-tag/2002Jun/0020.html on this.
In particular, since protocols are going to be read by an XML processor,
and since an XML processor is going to have to be able to read UTF-8 and
UTF-16, the requirement to handle only one of these two actually imposes
extra work - and it's actually hard to see where in the protocol chain
you'd efficiently do that work. Presumably the easy way to design a
protocol is to feed the bits on the wire to an XML processor and deal
with it through SAX or DOM or CLR or some such; are you going to put a
filter in front of the processor to check the char encoding? Or are you
going to ask the processor what encoding it was in so that you can toss
it (after it's been successfully parsed) because you don't like the
encding? This seems like a really egregious violation of "being liberal
in what you accept". Note that popular XML parsers, e.g. expat, give
the programmer UTF-8 anyhow regardless of how the input showed up.
============================================
Thanks (on behalf of the whole community) for the work you're putting
into this. -Tim