[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Shipping Atom products prematurely



* Paul Hoffman / IMC <phoffman@xxxxxxx> [2004-08-17 19:43-0700]
 
> - We expect that the extensibility will be cleaner and better defined 
> so that small markets will be able to use Atom easier than previous 
> RSSs.

Really? That'd be great, but I've not seen much evidence that 
the WG takes that view. I'm comparing with RSS 1.0's RDF/XML-based
extensibility framework; the other flavours don't really have one,
except for RSS 2.0's "use namespaces and you're on your own".

Extensibility comes at a price, and designing something cleaner and
better than RDF will be an interesting endeavour. Finding a clean and 
syntactically graceful way of mapping _into_ RDF also comes at a price, 
as does (of course) simply using full RDF/XML syntax. The pre-IETF 
Atom community decided some time ago that they didn't find RDF/XML 
an attractive proposition, which I guess means we're going the route of 
defining an extensibility model that is somehow better than RSS 1.0's.

Hmm so what were the key benefits of RDF extensibility in RSS 1.0? 

 - (fairly) predictable XML notation; RSS 1.0 defined a profile of the 
   RDF/XML syntax, so that namespace-extended feeds all shared a 
   basic structure. (rather than allowing all RDF's syntactic
   variations).

 - Supported free combination of independently developed descriptive 
   vocabulary (manifested as RDF/XML-based namespaces). RSS 1.0 feeds 
   can carry extra markup describing things in the world beyond
   syndication, such as people, places, movies, bank accounts. Element 
   names in the markup correspond to classes (categories) and properties 
   (fields, relations etc) defined by any RDF vocabulary that proves 
   useful.

 - The external vocabularies a feed draws upon do not need to be defined 
   with RSS 1.0 (or Atom or newsfeed syndication) in mind. Or be tightly 
   coordinated amongst themselves. There is a tightly-defined model and 
   simple-minded (additive) model for explaining how these independent 
   namespaces interact when deployed together.

What were the problems / drawbacks with RSS 1.0's RDF extensibility?

 - explaining the XML-level constraints on markup structures amounted 
   to the need to present a mini-tutorial on RDF's syntax rules, since 
   RSS 1.0 used RDF's standard XML encoding.  This involves unenviable 
   tasks like explaing RDF's "striped" XML style (see 
   http://www.w3.org/2001/10/stripes/ ), and trying to summarise the 
   rules for when you use "rdf:about=" versus "rdf:resource="
   attributes.

 - when RSS 1.0 shipped (4 years ago) there weren't many RDF
   vocabularies, software libraries were less mature, and the RDF
   specs hadn't gone through the RDFCore cleanup (which finished Feb'04).
   Sites like http://www.schemaweb.info/ show that there are a 
   growing number of vocabularies, but many are still a bit drafty.

 - RSS 1.0 was perhaps a little too minimalist, forcing people to use 
   extensions for things that a broader syndication-oriented vocab 
   could have included in a more generous core.

 - The only widely used extension for carrying hypertext content in 
   RSS 1.0 was 'content:encoded', which somewhat opts-out of 
   the XML world. RDF in 2000 was a bit vague on how to deal with 
   namespaces, xml:base, xml:lang  andther canonicalisation issues 
   relating to "literal XML" content. (addressed in Feb'04 specs)

 - the blogging use case (which dominated RSS deployment and evangelism)
   didn't have as much to gain from a powerful extensibility framework
   as those apps which sometimes get called 'synthetic feeds'.
 

> None of that might spin your beanie. One big reason many of us are 
> working on Atom is that we believe that RSSish syndication can become 
> a major communication mechanism for literally decades to come (look 
> at RFC 822 and MIME, for instance). If that is true, it should have 
> the most polished, most reviewed, and clearest spec possible.

Yes, that's what draws me to RSS 1.0 and its siblings. One lesson from 
the non-blogging use cases (job adverts, movie listings, bank account
feeds, etc.) that get some of us enthused is that the most interesting 
bits of the markup are those which use (possibly various) non-RSS 
namespaces. RSS is the least interesting, and simplest, part of the
document.

I think the recently announced "Nature" RSS 1.0 feeds might repay study, 
as a way of thinking about the tradeoffs around AtomPub's extensibility
goals.

See http://www.nature.com/rss/

There are feeds there from scientific journals, and for job listings in 
the sciences. The extensions used are both useful, and evocative. They 
immediately suggest further extension ideas. 

eg. http://www.nature.com/nrc/journal/v4/n8/rss.rdf
[[
<item rdf:about="http://dx.doi.org/10.1038/nrc1424";>

    <title>TUMOUR VIRUSES: A genetic switch</title>
    <link>http://dx.doi.org/10.1038/nrc1424</link>
    <description>Kristine Novak</description>
    <dc:title>TUMOUR VIRUSES: A genetic switch</dc:title>
    <dc:creator>Kristine Novak</dc:creator>
    <dc:identifier>doi:10.1038/nrc1424</dc:identifier>

    <dc:source>Nature Reviews Cancer 4, 572 (2004)</dc:source>
    <dc:date>2004-08-01</dc:date>
    <prism:publicationName>Nature Reviews Cancer</prism:publicationName>
    <prism:publicationDate>2004-08-01</prism:publicationDate>
    <prism:volume>4</prism:volume>
    <prism:number>8</prism:number>

    <prism:section>Highlights</prism:section>
    <prism:startingPage>572</prism:startingPage>
</item>
]]

or, more interesting: http://www.nature.com/naturejobs/jobs/biologicalsciences.rdf
[[
<item
rdf:about="http://naturejobs.nature.com/texis/jobsearch/details.html?id=411109c54a01090&amp;lookid=nature";>
<title>University of Sheffield: Research Associate</title>
<link>http://naturejobs.nature.com/texis/jobsearch/details.html?id=411109c54a01090&amp;lookid=nature</link>

<description>Research Associate, University of Sheffield, Sheffield,
United Kingdom.  Posted on 4 August 2004.</description>
<nj:advertises>
<nj:Job>
<nj:offeredBy>University of Sheffield</nj:offeredBy>
<nj:title>Research Associate</nj:title>
<nj:city>Sheffield</nj:city>
<nj:country>United Kingdom</nj:country>
</nj:Job>
</nj:advertises>
<nj:postedOn>2004-08-04</nj:postedOn>
<nj:expiresOn>2004-09-02</nj:expiresOn>
</item>
]]

So already we're talking about the syndicating online representations of 
articles, digital rights, page numbers, authors, cancer, jobs, cities, 
locations and expiry dates for job applications. All of those things 
are (a) beyond the immediate and future scope of the AtomPub group (b)
described in different levels of detail by different parties for
different purposes. Eg. jobs are based in places; they have associated
skills, which might be picked out in cross-domain subject schemes or in 
domain specific details; places have lat/long info, which can be
modelled in painful detail or very crudely. XML description here is a
task without end, because we're trying to create a marketplace where it
is possibly for increasingly rich descriptions to be mixed together 
in as sane a fashion as possible.

I'd be interested to take this
http://www.nature.com/naturejobs/jobs/biologicalsciences.rdf feed as a
test for AtomPub's extensibility goals. Would we hope to do a
cleaner/better job than RSS 1.0 here. Perhaps in the future,
scientifically minded job hunters will be able to go do a search on jobs 
in their particular speciality and see the results using
geographically-oriented tools. I'd love to see Atom become a transport
for all this...

cheers,

Dan