[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RELAX NG and W3C XML Schema



I would like to contribute some comments on the issues that James raised, first by
suggesting a preferred solution, then by providing some justification,
and finally giving some quick responses to Eric Sedler of Oracle's comments.

(I am the developer of the Schematron schema language, and am serving
as the editor for its ISO standardization in DSDL.  I was on the W3C XML Schemas
Working Group for the last year or more until its release. I was part
of the large XML working group for most of its life. I work for a company
which is involved in schema-related tools for XML: our freebie validator
supports Schematron, DTDs, XML Schemas with RELAX NG prototyped, and
our forthcoming editor supports all these too. I have not participated in the
development of RELAX NG apart from a couple of requests for clarification.
I also worked in the comms industry for several years, programming realtime
microcontrollers before moving into publishing.) 


1. Preferred Text

Instead of the text around

"XML Schema should be used as the formalism in the absence of
clearly stated reasons to choose another."  

I suggest words to the effect of

"Standard schema languages should be used as the formalisms in the 
absence of clearly stated reasons to choose another."  


2. Justification

It is glib but worthwhile to consider the Internet as a Petri dish
for technology.  By providing the medium in which many different
technologies can be tried and then seeing which ones flourish,
the needs of the majority and of minorities are supported.  

I am not so rash as to say the best technology always wins,
but we don't need to treat schemas as a popularity contest: 
HTTP is not "better" than FTP merely because more people use it. 
They are different and good in their appropriate places.  

Looking at the structure of Internet protocols, we can see a 
repeated pattern:  each "layer" (whether we use the OSI
layers or Malamud's 4 layers+MIME I don't think it matters)
has a mechanism to allow selection between different next
layers.  So on an ethernet we can run different protocols.
On top of TCP/IP we have well-known ports. On top of
HTTP we have MIME media types. On top of XML we
have document types or controlling namespaces.  

Mandating XML Schemas only (or primarily) would break this 
general pattern which has served the Internet very well. 

Instead, in the medium term an approach like ISO DSDL
is more Internet-ish. It is being developed as a framework
for allowing modular, smaller schema languages.  If XML
Schemas had adopted a modular framework that allowed
easy evolution and growth of its parts, then we could discuss
the technical merits of each part: "is RELAX NG's structure 
modeling nicer than XML Schema's structure modeling?",
for example.    But with XML Schemas we cannot
even fruitfully get that far: it is monolithic and therefore
difficult to benefit from innovation outside the W3C and
its larger stakeholders.

Being monolithic rather than modular may have a seeming
benefit that specifying "XML Schemas only" simplifies life by
letting us get on with other business.  However, what is XML Schemas?
I have heard reports of an XML Schemas 1.1, and Eric mentions
an XML Schemas 2.0.   We will have to face up to plurality
sooner or later, especially if XML Schemas fails to be much chop
for the purpose of a formalism for specifying protocols.


3. Eric Sedler of Oracle's comments

The original post is at http://www.imc.org/ietf-xml-use/mail-archive/msg00261.html

Eric's viewpoint is that what is good for DBMS is good for RFCs.  But do
we in fact expect that the kinds of wrappers and transactions that are
specified in an RFC end up directly in a database?   Surely the protocol
data is ephemeral and is discarded by the time the transmitted data is
put to use. The draft concerns conventions for protocol developers.

> * XML Schema was developed as a compromise between the 
> data-oriented people (like Oracle) and the document-oriented people.  

I find it hard to recollect a single decision made specifically  to support publishing
in my time in the WG. Indeed, it was only at the very end (after the
important decisions were made) that any attempt was made to systematically
test XML Schemas against web-publishing languages.   So my perception
is that this statement is dead wrong.

> * I think the number one thing that is important for the IETF in recommending 
> an XML structure language is market acceptance, 

Surely the number one thing is that the formalism can express the requirements
as simply as possible without introducing unnecessary pain to reviewers or 
gratuitous cases which cannot be explained in the language, yet still be executable?

XML Schemas is verbose and so reviewers, if they are indeed part of the
"market" Eric mentions, are just as likely to understand a particular set of mediating 
tools rather than being able to wade through the XML text of a schema. 

Indeed, perhaps terseness is of maximal importance :-)

> While Relax NG went through a standards process, it didn't have a lot of 
> participation in the process

RELAX went through the JIS process, then RELAX NG went through the
OASIS process, now it has been going through the ISO process.  If Eric are
saying that RELAX did not have a lot of Westerners involved in it, that
is fair; on the other hand I can attest as someone resident in the East
that it is almost impossible for there to be Eastern participation in W3C
standards because of the tyranny of distance and time.  WGs that
only meet in the West and have teleconference meetings at 1am local time 
have rather limited appeal.

But, of course, I doubt that "participation" means participation by the
worldwide college of researchers and academics and specialists
who have so far dominated the development of internet protocols. 
I think "participation" is some codeword for "big business". 

> I think the current Schema implementations (given their existence) are better 
> than most Relax NG implementations (which are much worse, since they don't exist).  

This is unmitigated rubbish.  There are at least four implementations of 
RELAX NG in various languages.  I have tried three: Jing, VBRELAX and 
Sun's MSV and found them all to be excellent.  For XML Schemas implementations
I have tried MSXML and Xerces 2.  I don't want to comment on the individual
implementations, but the slightest glance at the public schema lists at W3C will
reveal how many woes people have had with inconsistent implementations. 

Since Sedler does not, on his admission, know anything about RELAX NG's technical
merits, nor apparrantly know anything about RELAX NG implementations, what
is his basis for writing?  I think his use of "market" reveals something.

XML Schemas is already facing general disinterest from the publishing world
(there was an interesting paper on this at the XML 2002 Europe conference).
Would it be so bad for RFC authors if they could pick a short, simple spec
to use (e.g. RELAX or a profile of XML Schemas or Schematron) as
was appropriate, rather than shoehorning into XML Schemas and perhaps
discovering the shoe does not fit anyway?

> A lot more work has been done on optimization and performance of schemas 
> than Relax NG

Well, I believe Murata-san has been toiling in this same field since the mid 90s.
And James Clark has been writing high efficiency parsers and validators
for longer than that.  What area of "optimizing" is Eric talking about?  Not
datatyping (RELAX NG can use XML Schemas datatypes) and not key/keyref
(hardly the thing that would be used in protocols, I expect).  And issues of
how to efficiently store and index data in a DBMS are surely tangential in
most cases to what formalisms should be used in RFCs.  So we are just
left with structure validation and parsing: this is a well-plowed field which
has had so much research into techniques and algorithms, that vague claims for
proprietary optimizations for streaming validation (if that is what Eric was
suggesting) can be treated as FUD for the moment.

> I think we understand the limitations of Schema better

Better the devil you know than the devil you don't?  We don't hold with your
new-fangled RELAX in these parts?  Would you include in that limitation
that XML Schemas uses gDates but Internet protocols are more likely
to use UTC? 

There is still no standardisation of large parts of XML Schemas (e.g. the PSVI, 
which is hardly surprising since it is largely a "wouldn't it be nice" thing rather 
than arising out of pre-existing use-cases) and there may not be (because of 
its distruption: why have two DOMs etc). I don't think one can claim market 
acceptance of vapourware. 

Oh, I have just read the response that the section may be removed. That would
be good too, so I should not labour my point. 

Cheers
Rick Jelliffe