[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RELAX NG and W3C XML Schema
Responding to Rick Jelliffe's message:
>3. Eric Sedler of Oracle's comments
-----------------------------
Sedlar, not Sedlar, thanks.
-----------------------------
The original post is at
http://www.imc.org/ietf-xml-use/mail-archive/msg00261.html
Eric's viewpoint is that what is good for DBMS is good for RFCs. But do
we in fact expect that the kinds of wrappers and transactions that are
specified in an RFC end up directly in a database? Surely the protocol
data is ephemeral and is discarded by the time the transmitted data is
put to use. The draft concerns conventions for protocol developers.
-----------------------------
* I'm just using a DBMS as one example system that benefits from XML Schema
that I happen to be familiar with, and that it is nice if they can
interoperate. My point is that database implementations are just one of
many implementations that should be taken into account. However, I will try
to use other examples in the future so as not to be viewed as pushing the
concerns of "big business". In the WebDAV IETF working groups, a
significant percentage of the servers implementing WebDAV use databases as
their backend infrastructure. The next hot standard that will be getting
work is DASL, which is the search protocol for WebDAV, which if you look at
the draft is very SQL-like. Regardless of the query language, searching
technologies are very often implemented with databases. I'm not saying that
"what is good for DBMS is good for RFCs". I'm just saying it is worth taking
into account.
-----------------------------
> * XML Schema was developed as a compromise between the
> data-oriented people (like Oracle) and the document-oriented people.
I find it hard to recollect a single decision made specifically to support
publishing
in my time in the WG. Indeed, it was only at the very end (after the
important decisions were made) that any attempt was made to systematically
test XML Schemas against web-publishing languages. So my perception
is that this statement is dead wrong.
-----------------------------
I spent only a short time on the Schema WG, after the big decisions were
made, but let me cite two examples that I was aware of:
* <redefine>: IBM, Oracle & Microsoft all wanted to get rid of this as they
didn't find it useful and a pain to implement. It was put in anyway at the
insistence of the XHTML people and the document-oriented folks.
* <key/keyref>: My understanding was that this was originally proposed by
my predecessor at Oracle, in the attempt to map to relational integrity
constraints in the database. However, it cannot be mapped to relational
integrity constraints in fact, and is really just an extensible mechanism
for the ID concept from DTDs. key/keyref was opposed by Software AG &
Microsoft. When I talked to Michael Sperberg-McQueen about this, telling
him this wasn't something that was helpful for the data-oriented folks, and
wouldn't be implemented in our first version at Oracle, he told me it was
very useful for document-oriented stuff (which it was my understanding was
more of his background).
----------------------------------------
> * I think the number one thing that is important for the IETF in
recommending
> an XML structure language is market acceptance,
Surely the number one thing is that the formalism can express the
requirements
as simply as possible without introducing unnecessary pain to reviewers or
gratuitous cases which cannot be explained in the language, yet still be
executable?
XML Schemas is verbose and so reviewers, if they are indeed part of the
"market" Eric mentions, are just as likely to understand a particular set of
mediating
tools rather than being able to wade through the XML text of a schema.
Indeed, perhaps terseness is of maximal importance :-)
----------------------------------------
The number #1 concern is making sure we both fluently speak a language in
common. The efficiency or terseness of the language is clearly secondary.
For example, I also know some Serbo-Croatian, and I could argue (my father,
a native speaker, always does) that it is a much better language than
English because it is phonetic--you never have words like "battle" where two
T's are used when one would clearly do, or words like "clothes" that could
be more efficiently communicated as "kloz" (which is the way letters from my
father are written).
----------------------------------------
> While Relax NG went through a standards process, it didn't have a lot of
> participation in the process
RELAX went through the JIS process, then RELAX NG went through the
OASIS process, now it has been going through the ISO process. If Eric are
saying that RELAX did not have a lot of Westerners involved in it, that
is fair; on the other hand I can attest as someone resident in the East
that it is almost impossible for there to be Eastern participation in W3C
standards because of the tyranny of distance and time. WGs that
only meet in the West and have teleconference meetings at 1am local time
have rather limited appeal.
But, of course, I doubt that "participation" means participation by the
worldwide college of researchers and academics and specialists
who have so far dominated the development of internet protocols.
I think "participation" is some codeword for "big business".
----------------------------------------
Boy, I must really be pushing some buttons here. There are clearly lots of
academics & researchers that were involved in XML Schema. Geez, the author
everybody loves to bash, Henry Thompson, is an academic from the UK, not a
representative of "big business". The IETF standards I know most about
(HTTP & WebDAV) were not "dominated by researchers and academics"--they were
collaborations between academics & industry. For HTTP, we have 2 authors
from Compaq, one from Xerox, one from Microsoft, and two from MIT/W3C. For
WebDAV, we have one author from Microsoft, one from Netscape, two from
Novell, and one from UCIrvine.
I would define diversity of participation as representatives of
architectures that work on a variety of technology stacks, allowing for
diversity of implementations & innovations, like HTTP & WebDAV. In the case
of XML schema languages, I would define diversity of participation as
something beyond just XML type validators that work in-memory on XML stored
in a native filesystem, and I cite a DBMS compiling schemas merely as an
example of a different architecture stack. Defining a language or protocol
that works for both of these architectures is probably something more
flexible & allowing for more innovation than something that really only
works for in-memory validation.
I was trying to point out that W3C and IETF standards tend to get more
implementations and traction in the market than ISO standards. It is
clearly much easier for a small group of determined believers or a single
big business to push something through in ISO than it is in W3C or IETF,
e.g. SQL99.
----------------------------------------
> I think the current Schema implementations (given their existence) are
better
> than most Relax NG implementations (which are much worse, since they don't
exist).
This is unmitigated rubbish. There are at least four implementations of
RELAX NG in various languages. I have tried three: Jing, VBRELAX and
Sun's MSV and found them all to be excellent. For XML Schemas
implementations
I have tried MSXML and Xerces 2. I don't want to comment on the individual
implementations, but the slightest glance at the public schema lists at W3C
will
reveal how many woes people have had with inconsistent implementations.
Since Sedler does not, on his admission, know anything about RELAX NG's
technical
merits, nor apparrantly know anything about RELAX NG implementations, what
is his basis for writing? I think his use of "market" reveals something.
----------------------------------------
Geez, again with the conspiracy theory stuff. My basis for writing is
working on XML Schema, and IETF working groups, and on commercial
implementations and talking to lots of customers. I'm sorry I don't know
everything before commenting on anything--I would rather admit lack of
knowledge up front than make assertions that are on the basis of incomplete
understanding.
-----------------------------------------
XML Schemas is already facing general disinterest from the publishing world
(there was an interesting paper on this at the XML 2002 Europe conference).
Would it be so bad for RFC authors if they could pick a short, simple spec
to use (e.g. RELAX or a profile of XML Schemas or Schematron) as
was appropriate, rather than shoehorning into XML Schemas and perhaps
discovering the shoe does not fit anyway?
------------------------------------------
So now you're making the market acceptance argument. Great. I support that
line of reasoning. I think most of the folks on this list agree to let
these schema
languages battle it out in the IETF working groups rather than here.
----------------------------------------------------------
> A lot more work has been done on optimization and performance of schemas
> than Relax NG
Well, I believe Murata-san has been toiling in this same field since the mid
90s.
And James Clark has been writing high efficiency parsers and validators
for longer than that. What area of "optimizing" is Eric talking about? Not
datatyping (RELAX NG can use XML Schemas datatypes) and not key/keyref
(hardly the thing that would be used in protocols, I expect). And issues of
how to efficiently store and index data in a DBMS are surely tangential in
most cases to what formalisms should be used in RFCs. So we are just
left with structure validation and parsing: this is a well-plowed field
which
has had so much research into techniques and algorithms, that vague claims
for
proprietary optimizations for streaming validation (if that is what Eric was
suggesting) can be treated as FUD for the moment.
----------------------------------------------------------
The claims about streaming validation needs were reported by early
implementors
of Schema, mostly in "big business" such as IBM. I make no claims other
than that
some implementors reported this finding.
This argument is clearly a hand-wavey argument without hard facts--just
anecdotes.
If you don't buy the general belief that optimizations & performance track
market
acceptance, it is probably out of scope to argue that point here.
----------------------------------------------------------
> I think we understand the limitations of Schema better
Better the devil you know than the devil you don't? We don't hold with your
new-fangled RELAX in these parts? Would you include in that limitation
that XML Schemas uses gDates but Internet protocols are more likely
to use UTC?
----------------------------------------------------------
"Better the devil you know than the devil you don't?" Yup. As long as both
devils are "good enough" to meet the requirements. English is good enough
for
us to communicate, even if Serbo-Croatian would be more efficient. Think
how
many bytes we could save by choosing a more efficient language!!!
XML Schema is extensible enough to allow for custom date datatypes, and
the UTC date format is something that would be in the scope for an IETF
XML Schema type library should the IETF decide in the future to standardize
on Schemas.
----------------------------------------------------------
There is still no standardisation of large parts of XML Schemas (e.g. the
PSVI,
which is hardly surprising since it is largely a "wouldn't it be nice" thing
rather
than arising out of pre-existing use-cases) and there may not be (because of
its distruption: why have two DOMs etc). I don't think one can claim market
acceptance of vapourware.
Oh, I have just read the response that the section may be removed. That
would
be good too, so I should not labour my point.
----------------------------------------------------------
Look, this argument applies to lots of other XML Standards. I got pushback
on
using DOM in JSR-147 (WebDAV API for Java) from the folks who didn't want
to privilege DOM over JDOM (which is a JSR as well). As long as you support
constructs like <include> (which I believe RELAX supports as well), you have
a
PSVI distinct from the DOM that comes just from parsing. Personally, I
think most
of the things JDOM tries to solve could have been done within DOM in the
W3C.
My point is that a common language is more important than a perfect
language.
One of the reasons that XML is so successful is that the market isn't being
fragmented
by people who think they could do it better. That's why I'd rather work
within the XML
standards process in the W3C than take seriously arguments like "if you want
that,
use ASN.1" or some other competitive standard. The counterexample to all of
this
is the standards for inter-application communication. The divisions between
CORBA and
Microsoft COM killed CORBA, and now we have a situation where every year or
two
there is some new better way to do it. Even if you just listened to Sun,
first we started
with CORBA, then Java RMI, then EJBs, and now Web Services. Drives regular
folks
like customers nuts.
I know that the Schema WG pissed a lot of people off by rolling over
objections in some
cases rather than trying harder to build consensus. There was a lot of
pressure to get something
out so that the vendor-specific alternatives, like XDR, didn't get too much
traction, and that
pressure wasn't from big business like Microsoft, believe me. I think the
W3C standards process
is too closed, and I like the IETF one better. IETF tends to attract more
hackers & implementor
types (like the Apache people), whereas W3C attracts academics who like to
argue about differentiating between -0 and plus 0 (which is clearly needed
and has its place, don't get me wrong). That's just my personal preference
for myself. I'd just rather see folks who don't like it pressure the W3C to
open up, rather than setup something slightly better to compete. The W3C
will respond to pressure from a broader community than its members--see the
whole patent issue as an example.
----------------------------------------------------------
Take care,
Eric Sedlar