[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Extensibility in Syndication formats



* Dare Obasanjo <kpako@xxxxxxxxx> [2004-08-18 01:24-0700]
 
> >  - (fairly) predictable XML notation; RSS 1.0
> > defined a profile of the 
> >    RDF/XML syntax, so that namespace-extended feeds
> > all shared a 
> >    basic structure. (rather than allowing all RDF's
> > syntactic
> >    variations).
> 
> How is this practically useful? The fact that an
> extension will show up as single elements with simple
> content or as elements with attributes and complex
> contents doesn't bring my application any closer to
> understanding them when encountered in the wild. 

We're not talking AI here. Just data mixing and a decentralised 
division of labour.

RDF's contribution is really a strategy for decentralising vocabulary
design. When someone designs an RDF vocabulary, the things they name 
and describe in their namespaces are classes (categories) and
properties (relationships etc.). When someone designs an XML vocabulary,
the things they name and describe in their namespace are XML elements and 
attributes. In the RDF case, the vocabulary is described in terms of the
world; you say things like "'wrote' is a relationship between an 'Agent'
and a 'Work'", or "'JPEGImage' is a sub-class of 'Image'"; in the XML
case, you talk more explicitly about markup patterns, rather than about
the things that markup tells you about the world. So we end up with 
committees and groups inventing XML markup, and their primary means 
of expression is the ability to say, basically, which XML elements can 
live inside which other XML elements, in what order, and which attributes
they are allowed to be decorated with. And there are half a dozen schema 
languages these guys can use to do that job in slightly different ways.
All RDF namespaces, by contrast, are based on the RDF Schema approach
(perhaps using the OWL extensions).

My problem with the "let them eat namespaces" view is that it ignores
the social mechanics that gets us this mixed-namespace markup in the
first place. I'm perfectly willing to believe that one could write 
XQuery or XSLT or DOM+.js to consume mixed markup, *assuming* that some 
collection of parties had figured out conventions for mixing their
namespaces together. But that more takes time, effort and money to 
achieve at a fine grained level if you opt-out of the RDF infrastructure. 
Non-XML namespaces can sit alongside each other in an XML tree, or live
one inside the other, but generally we've seen precious little by way of 
freely mixable XML namespaces. By contrast, *all* RDF vocabularies can
be deployed in a mixed way, out of the box, because RDF indirects
through a common data model and XML encoding which imposes some common
conventions across otherwise indpendent vocabularies. 

The idea that different parties "simply create their own XML namespaces"
is problematic because the world doesn't come organised into nicely 
parceled, crisply discrete problem spaces, each with their own
MyProblemSpaceML markup notation. Things are horribly jumbled up 
(which is why AI failed to deliver, imho).

So you think you're working on a "digital images" markup language 
for photos, and you find you're spending half your time thinking 
about geographic markup, or representing the content of the picture, 
or fending of motionpicture people who claim your problem space 
is subsumed by theirs. You think you're working on geographic 
markup, places and coordinates and maps,
and find yourself drawn into modelling the things that are on the map.
You think you're creating bibliographic metadata but find it to be 
intimately tangled up with rights metadata, with educational level
classification metadata (which btw crops up again if you're doing jobs,
CVs, and personal profile work (and which varies wildly between
countries)). Everything is jumbled up with everything else, and so we
need some conventions for people to get out there and do their bit
without waiting for everyone else to finish the other bits of the
puzzle. 

Should the folk doing Job advert markup have a meeting with the people
doing geo markup or postal addresses, to decide whose tags can go in
whose, and update their schemas accordingly? What about CVs? Photos?
Bibliographies, educational metadata, rights, and so and so on? I got on
board the RDF train after being burnt out from attending so-called
metadata initiative meetings (primarily biblio, imaging, education,
search) where people were merrily creating tagsets whose scope and
features overlapped and who badly needed a bit more architecture for
fine grained mixing, so they could concentrate better on their area of
expertise, and leave the detail of other areas to be fleshed out by 
folk with complimentary exercise. Without having to sit around a table
with them arguing about XML tag nesting structures.

This is not, and shouldn't be mistaken for, the old AI dream of 
machine intelligence. It's simply a wish to have to fly around to fewer 
standards coordination meetings. And to do that, we need some high level
things that all XML namespaces have in common. Whether tag order is
significant, for example (in RDF, it almost always isn't). Whether
there's negation-as-failure closed world assumptions (in RDF, we avoid
this), a convention for knowing whether an element stands for a category
of thing, or a kind of relationship between things, etc etc. 

There are several options. We can hope that people somehow create XML
namespaces that play well together, in the absence of such conventions.
This hasn't happened yet, but there's always hope. Or we can try to 
invent some such conventions within the AtomPub WG (add
11-24 months to the schedule; plus same again 2 years later to fix
mistakes), or we can back off from the 'Atom everywhere' rhetoric and 
decide that Atom's really about interop in the blogging world, and 
that full on data syndication is a v2.0 problem, and that v1.0 targets
bloggers.

When feeds are carrying rich namespace-extended descriptions of the things 
the feeds describe (jobs, journals, products, holidays, pornography, people, 
cities, mail messages, CVS servers, journal articles, paper-published 
books, MP3s, Ogg streams, playlists, concert listings, weather reports, security
alerts, blog comments, answerphone messages, bank transactions, network
outages, TV schedules, dentist appointments, football results, product
recalls, train times, blind dates, press releases and -yesyes- blog posts, 
... *then* we'll have 'atom everywhere'. But unless we're going to slip 
back 5 years in terms of expressivity, Atom needs a way to allow all
these kinds of thing to be described using whatever externally-managed
namespaces make sense in the marketplace.

For externally managed namespaces to make sense when deployed together, 
they need to be designed with that in mind. Which brings us back to 
the frameworks on offer to folk creating those namespaces. The examples
I gave above are mix. There's some blogging and information-resource use
cases in there (though digital library stuff quickly shades off into
complexity). There's a lot of things focussed around people, around
places, and in particular around events. Hardly suprising; syndication
is event-centric. Not blog posting events, but events in the world that 
are associated in various (potentially nameable) ways with the information 
items we syndicate in XML. The AtomPub WG isn't (AFAIK) in the 
business of providing exhaustive descriptions of events, of places, 
of people. It is, perhaps, in the business of providing a syndication 
framework where richer descriptions of these things (and more) can be 
mixed together. This doesn't mean that all Atom code needs to understand 
them, or will magically become intelligent and able to act upon new markup. 
Just that there could usefully be a few common patterns for mixed-namespace 
markup which allow the ***huge*** task of describing all this stuff to 
be divided up amongst parties who may never meet or even be working on their
namespaces at the same time. (I like the OpenGALEN slogan here; "making the 
impossible very difficult" ;)

So we should always be thinking about how to divide up the work, ie.
what can we say to people who want to contribute a better way to
describe jobs, drawing upon existing work re skills description, topics,
location? What to say to people who want to syndicate photo metadata,
drawing upon lat/long markup, 'who is in this photo' markup, common
nouns, EXIF fields, and so on. Do we encourage them to have anything in
common with each other's efforts, or just say "use XML+namespaces, go
invent some named elements and attributes and tell us what markup
patterns you consider valid".

If they go the vanilla XML+namespaces route, they get
to pick some named XML elements and attributes, and say some stuff about
which element combination patterns are allowed, and how they can be
decorated with XML attributes. If they go the RDF route, they don't get
asked which elements theirs can go inside, and which can go inside
there, **because it isn't up to them**. RDF quite explicitly witholds that
ability from the creators of a namespace, so that we don't force people
to anticipate all future uses of their creation, or get into rigid and
fragile versioning coalitions with owners related tagsets. (and no, RDF doesn't
solve the namespace versioning problem, but it makes it approachable, or
at least merely very hard).
 
The RDF approach isn't trying to make data universally understandable in
any fancy AI sense, just universally mixable. If I want to aggregate jobs
data, I need to know a bit about Jobs-related namespaces. If I want a 
really smart Jobs aggregator, I'll go and investigate namespaces that
relate to places, to events/time, to skill and topic description, and to
geography. And I'd be well-advised to create some nice tools that
actually use that data, and go evangelise those extensions to parties
who'll create enough feeds to get some adoption. No magic, just a bit of
structure around a lot of hard work.

> I see this in RSS 2.0 as well. 

RSS 2.0 says "how these namespaces get designed and how they play
together isn't our problem". Which brings us back to the scoping
question. If AtomPub's deliverable is really focussed around weblogging, 
then maybe it's OK to say "we don't know yet; maybe in Version 2". But
if Atom is to be marketed as the backbone for Web-based data
syndication, establishing a framework that'll serve us for decades to
come, then the "not our problem" approach to mixed-namespace design
simply doesn't cut it.

cheers,

Dan