[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sticks, carrots, real XML, ultra-liberal parsing (was: Re: Auto-discovery revisited)




On Jan 9, 2004, at 13:41, Danny Ayers wrote:


But I'm not suggesting that anything reject invalid data,

I'm suggesting rejecting data that purports to be XML but is not well-formed. This should follow automatically from the decision to use XML.


I find it disturbing that Mark Pilgrim--one of the key proponents of Atom--is advocating ultra-liberal parsing[1]. If something is not well-formed, by definition it isn't an XML document. Publishing an ultra-liberal parser encourages people to use ill-defined infoset serializations that look confusingly like XML 1.0 serializations. These embraced and extended serializations won't be parseable using XML processors, and having tag soup serializations around is worse than having binary infoset serializations pop up, because with the binary serializations it is obvious that they aren't XML 1.0 serializations, so no one expects them to be interoperable with software that uses an XML processor. If Atom ends up as almost-XML, the "XML" part will be more about marketing than about technical spec layering.

If Atom in practice ends up as something that I can't process with vanilla
XML tools and that is as soupish as RSS, I fail to see why I should want to use the feed facet Atom instead of RSS and why Atom couldn't be defined as an extension/update to RSS.


merely that the production of good data should be encouraged. On a
rudimentary level, if people think they have something to gain from
producing good data, they're a lot more likely to do so.

How do you encourage the production of good data without a stick? (The stick being that bad data won't be processed so you might as well not publish it.) The experience with HTML suggests that the majority of people cares more about pleasing the user agent with the largest market share than about producing good data as a matter of principle or about pleasing a validator. If the popular aggregators use an ultra-liberal parser people *will* publish broken feeds and all the hard work with the quality assurance tools and test cases can't change that. On the other hand, people *will* publish well-formed feeds (if they bother trying to publish an Atom feed at all) if the popular aggregators use real XML processors.


(I've implemented a well-formed Atom 0.3 feed[2]. Quoting Mark Pilgrim: "I have no sympathy for people who can’t be bothered to write code I've already written.")

Atom can offer a lot.

What can it offer compared to existing formats or APIs if it doesn't offer predictable and reliable processing and good implementability?


So far, in the absence of real XML processability, the main offering of the feed facet seems to be that you can have payloads other than "entity-escaped HTML", but high-traffic sites are already trying to minimize the amount of content they put in the feed. The main offerings of the API facet seem to be introspection and better authentication in port 80. (I'd still rather take those features with the POSTed content straight on top HTTP without an XML envelope.)

The mistake of "reject invalid RSS feeds" is that it's a stick. What we
need are carrots.

What carrots would you suggest? ("Valid Atom" badges aren't good enough, in my opinion.)


[1] http://diveintomark.org/archives/2004/01/08/postels-law
[2] http://macsanomat.com/atom

--
Henri Sivonen
hsivonen@xxxxxx
http://iki.fi/hsivonen/