[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RELAX NG and W3C XML Schema



Eric Sedlar scripsit:

> * Most people learn XML Schema from the primer (part 0), which is very
> well written and very accessible.  

RNG also has a tutorial that is very accessible, and concentrates on
simple cases while being comprehensive.  That's partly because it's
easy to be comprehensive with RNG: the power of the language is such
that you can do much with little.

> * The number one barrier to acceptance of any formalism more complete
> than DTDs is market acceptance--how many people have bothered to even
> get a simple understanding of it.  Maybe about half of the people on the
> DAV WG know enough Schema to be able to review specs written in the
> language.  I don't think anybody on the WG knows enough RELAX NG to
> where we could use it.

It's natural that you don't want to devalue your investment in learning
Schema.  However, the market penetration of Schema is small enough
that this shouldn't play a large role.

> The earlier versions of the WebDAV ACL spec I
> wrote using XML Schema were rewritten into DTDs just to promote greater
> awareness.

If you had used RNG, you would have been able to down-translate to DTDs
automatically.  In essence, DTDs are a third (somewhat restrictive) syntax
for RNG.

> * Most of our implementors are using tools like XML Spy to develop
> schemas

If they are using XML Schema, it's no wonder.  Writing RNG schemas,
especially with the new compact (non-XML) syntax, is entirely feasible
by hand.

> * XML Schema was developed as a compromise between the data-oriented
> people (like Oracle) and the document-oriented people.  It has some of
> the problems that occur when things are designed by committee to meet
> the needs of multiple constituencies.  However, that's also why it has a
> lot more market acceptance.

With the exception of identity constraints, RNG provides a superset of
XML Schema facilities.  It works equally well for documents and data,
with no compromises required.  And if XML Schema has more market
acceptance, that's because several 800-pound gorillas stand behind it:
the W3C itself, plus its larger members.

> * If I were comparing XML structure definition languages to programming
> languages, I would say XML Schema is like C++, and Relax NG is like
> Lisp.

Whether you mean it to be or not, this is a smear tactic.  Lisp has a
reputation for slowness and unwieldiness that is entirely undeserved.
In fact, Lisp compilers are very efficient and Lisp is crisp.

> C++ was another one of those languages designed by committee, 

Far more accurate to say that (Common) Lisp was designed by committee.
C++ was mostly designed by two people, Bjarne Stroustrup for the language
and Alexander Stepanov for the template library.

Design by committee is neither necessarily a Good Thing nor necessarily
a Bad Thing.

> * I think the number one thing that is important for the IETF in
> recommending an XML structure language is market acceptance, 

In that case, DTDs plus formal prose are far and away the winners.

> * XML Schema feels more data-oriented while Relax NG feels more document
> oriented.

"Feelings" are not a suitable basis for technical decisions.  RNG is more than
adequate for the needs of data-oriented XML.

> I worry that Relax NG validation performance will compare to
> Schema validation performance in the same way that XSLT compares to
> JSPs.

Based on what possible evidence?  Have you run comparisons?
Evidently not, as you seem to believe (see below) that RNG is not implemented.

> * While Relax NG went through a standards process, it didn't have a lot
> of participation in the process,

Anyone was free to participate who wished to, unlike the situation with
XML Schema.

> and I don't believe it meets the needs
> of all the potential consituencies as well as Schema does due to that
> lack of participation.

Evidence?

> * The #1 problem with using XML in IETF protocols, in my opinion, is not
> being able to put binary data in directly.

If you want ASN.1, you know where to find it.

> It would certainly be
> possible to add something like chunked-transfer-encoding in XML 2.0
> (Core), and I think if the IETF is going to criticize the work of the
> W3C, that would be a more useful avenue than criticizing Schema.  I
> would love to say <content length=3D"2e45">binary stuff</content> in my
> protocol messages rather than forcing a base64 encoding.

XML is by nature a textual format: it is composed of characters, not bytes.
Bytes are used to represent the characters, but that is a lower level.

> * DTDs are definitely not good enough to express the XML needed in
> protocol definitions.  

Agreed.

> So I don't think this presents
> a practical problem for most Schema users.  If this were a problem, then
> Schema wouldn't have the market acceptance that it does.  Unreadability
> of the Schema spec is only a problem if it limits market acceptance.

1) I don't think you can get agreement that market acceptance is everything;
2) The market acceptance of XML Schema is being driven from the top down;
3) There is no real market acceptance of XML Schema.

> * Point B:  any time you freeze a specification, you do so with some set
> of features that is less than what some people would desire.  Schema
> froze its spec much earlier than Relax NG.

In fact, the TREX specification was frozen around January of 2001, roughly
contemporary with the Schema CR of October 2000 and the Schema PR of
March 2001.  The changes between TREX and RELAX NG consist mostly of
removing unnecessary features and restrictions.  The only added feature
AFAIK is the list pattern, which does plainly owe something to XML Schema
derivation-by-list.

> Also, there were good reasons for not adding in some of these features. 
> I know that some of the restrictions for the <all> group were there
> because of performance difficulty that streaming processors would have. 

In fact, Jing (James Clark's RNG validator) is a streaming processor,
and RNG interleave patterns are not only more powerful than <all>,
they are more powerful than the original SGML & connector.

> My contention would be that Schema is too feature-rich for version 1.0,
> not too feature-poor, which is what James suggests.

It's feature-rich and power-poor.

> The market clearly wants default values.

1) If you want XSLT, you know where to find it.
2) It would be easy to build a streaming processor to do attribute
defaulting.  What has that got to do with schema validation?

> * Point D:  James is complaining about the limitations of current
> implementations with respect to their handling of xsi:schemaLocation. 
> This is clearly not a problem with the Schema spec.  I think the current
> Schema implementations (given their existence) are better than most
> Relax NG implementations (which are much worse, since they don't exist).

Say what?  James publishes a freely available validator.  Sun has a freely
available multi-schema-language validator (which essentially translates
all the others into RNG, BTW).  Tenuto is a validator written in C#;
VBRELAXNG is a validator written in Visual Basic.  Relaxer was designed
for RELAX, but handles much of RNG as well.  There are lots of conversion
tools.

>  However, my experience has been that it is very nice to use the
> schemaLocation tag, because without it, instances don't know what type
> they are.

1) You can't count on it working;
2) If the instance lies about its proper schema, it will pass validation
but spoof the application.  In a security-conscious Internet, this is serious.

> My conclusions: (Disclaimer: I don't know Relax NG very well--just a
> onceover of the spec, but as James's argument rests mostly on the faults
> of XML Schema, I can address those well)

Well, I don't know XML Schema that well either.

> * It has (and will continue to have) greater market acceptance than
> alternatives like Relax NG, and getting the maximum number of people to
> review the protocol definitions is more important than dealing with
> inconsitencies in the schema language abstractions that only come up in
> corner cases that nobody needs in IETF protocol standards.  Market
> acceptance has always been the primary focus of IETF standards work
> (look at HTTP for Pete's sake), not purity of abstraction

Rough consensus and running code, yes.  Anyone who spends an hour
looking at RNG with an unbiased eye can learn to read and understand
RNG schemas in the XML syntax.  In the compact syntax, which is far
better for authoring, it doesn't even take that long.

> * Schema is more data-centric, and is more natural for protocol data.

No evidence for this one.

> * A lot more work has been done on optimization and performance of
> schemas than Relax NG, and I believe that performance of validation will
> be a primary concern for IETF protocol implementations.

The amount of work done is not an indication of the quality of the result,
in general.  It may merely reflect how complicated the task is.

> At Oracle,
> we've been working on XML Schema compilation for 2 years.  While I don't
> think we have the implementation experience to demonstrate either way,
> my belief is that performance of Schema validation vs. RelaxNG will
> track market acceptance.

In general cases you may be right.  However, an XML Schema implementation
effort is just plain going to be a lot bigger and harder than an RNG
implementation effort.

-- 
John Cowan <jcowan@xxxxxxxxxxxxxxxxx>     http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_