[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Extensibility in Syndication formats

--- Dan Brickley <danbri@xxxxxx> wrote:
> > How is this practically useful? The fact that an
> > extension will show up as single elements with
> simple
> > content or as elements with attributes and complex
> > contents doesn't bring my application any closer
> to
> > understanding them when encountered in the wild. 
> We're not talking AI here. Just data mixing and a
> decentralised 
> division of labour.

Before continuing, I'd like to note that besides
[perhaps] NewsMonster of the dozens of RSS aggregators
I have seen in action none treat extensibility in RSS
1.0 any different from extensibility in RSS 2.0. It's
all just XML and namespaces to them. The other
aggregator authors on the list can pipe up and
disagree with me if I am mistaken. 

Thus the claims of superiority of using RDF-based
extensibility in RSS 1.0 versus using XML + namespaces
as RSS 2.0 does have simply never been shown in the
wild besides the proof of concept stuff that the RDF
syndication usual suspects like Danny Ayers trot out
whenevr this discussion occurs. 

> RDF's contribution is really a strategy for
> decentralising vocabulary
> design. When someone designs an RDF vocabulary, the
> things they name 
> and describe in their namespaces are classes
> (categories) and
> properties (relationships etc.). When someone
> designs an XML vocabulary,
> the things they name and describe in their namespace
> are XML elements and 
> attributes. In the RDF case, the vocabulary is
> described in terms of the
> world; you say things like "'wrote' is a
> relationship between an 'Agent'
> and a 'Work'", or "'JPEGImage' is a sub-class of
> 'Image'"; in the XML
> case, you talk more explicitly about markup
> patterns, rather than about
> the things that markup tells you about the world. So
> we end up with 
> committees and groups inventing XML markup, and
> their primary means 
> of expression is the ability to say, basically,
> which XML elements can 
> live inside which other XML elements, in what order,
> and which attributes
> they are allowed to be decorated with. And there are
> half a dozen schema 
> languages these guys can use to do that job in
> slightly different ways.
> All RDF namespaces, by contrast, are based on the
> RDF Schema approach
> (perhaps using the OWL extensions).

So far I haven't seen something which indicates a
better extensibility model when using RDF versus XML +
namespaces. It is more likely that an XML language
contains constructs that don't map to real world
constructs but instead are just there for markup
purposes but I see the same in RDF/XML as well with
all the sequences, lists and bags which don't even
occur in RSS 2.0 anyway. 

In fact one could argue that there is more of a 1:1
mapping between syndication construct in the typical
RSS 2.0 document than in an RSS 1.0 document
describing the same data. 

> My problem with the "let them eat namespaces" view
> is that it ignores
> the social mechanics that gets us this
> mixed-namespace markup in the
> first place. I'm perfectly willing to believe that
> one could write 
> XQuery or XSLT or DOM+.js to consume mixed markup,
> *assuming* that some 
> collection of parties had figured out conventions
> for mixing their
> namespaces together. But that more takes time,
> effort and money to 
> achieve at a fine grained level if you opt-out of
> the RDF infrastructure. 
> Non-XML namespaces can sit alongside each other in
> an XML tree, or live
> one inside the other, but generally we've seen
> precious little by way of 
> freely mixable XML namespaces. By contrast, *all*
> RDF vocabularies can
> be deployed in a mixed way, out of the box, because
> RDF indirects
> through a common data model and XML encoding which
> imposes some common
> conventions across otherwise indpendent
> vocabularies. 

So far all you've done is assert that RDF based
extensibility is better without showing why in
concrete terms. I don't see why you claim it is so
hard to reuse an element from one schema in another,
it hapopens all the time in the world of RSS. Heck,
just a few days ago on this list we were discussing
using the dcterms:* elements in Atom without resorting
to using RDF-based extensibility, just XML and

> The idea that different parties "simply create their
> own XML namespaces"
> is problematic because the world doesn't come
> organised into nicely 
> parceled, crisply discrete problem spaces, each with
> their own
> MyProblemSpaceML markup notation. Things are
> horribly jumbled up 
> (which is why AI failed to deliver, imho).
> So you think you're working on a "digital images"
> markup language 
> for photos, and you find you're spending half your
> time thinking 
> about geographic markup, or representing the content
> of the picture, 
> or fending of motionpicture people who claim your
> problem space 
> is subsumed by theirs. You think you're working on
> geographic 
> markup, places and coordinates and maps,
> and find yourself drawn into modelling the things
> that are on the map.
> You think you're creating bibliographic metadata but
> find it to be 
> intimately tangled up with rights metadata, with
> educational level
> classification metadata (which btw crops up again if
> you're doing jobs,
> CVs, and personal profile work (and which varies
> wildly between
> countries)). Everything is jumbled up with
> everything else, and so we
> need some conventions for people to get out there
> and do their bit
> without waiting for everyone else to finish the
> other bits of the
> puzzle. 
> Should the folk doing Job advert markup have a
> meeting with the people
> doing geo markup or postal addresses, to decide
> whose tags can go in
> whose, and update their schemas accordingly? What
> about CVs? Photos?
> Bibliographies, educational metadata, rights, and so
> and so on? I got on
> board the RDF train after being burnt out from
> attending so-called
> metadata initiative meetings (primarily biblio,
> imaging, education,
> search) where people were merrily creating tagsets
> whose scope and
> features overlapped and who badly needed a bit more
> architecture for
> fine grained mixing, so they could concentrate
> better on their area of
> expertise, and leave the detail of other areas to be
> fleshed out by 
> folk with complimentary exercise. Without having to
> sit around a table
> with them arguing about XML tag nesting structures.
> This is not, and shouldn't be mistaken for, the old
> AI dream of 
> machine intelligence. It's simply a wish to have to
> fly around to fewer 
> standards coordination meetings. And to do that, we
> need some high level
> things that all XML namespaces have in common.
> Whether tag order is
> significant, for example (in RDF, it almost always
> isn't). Whether
> there's negation-as-failure closed world assumptions
> (in RDF, we avoid
> this), a convention for knowing whether an element
> stands for a category
> of thing, or a kind of relationship between things,
> etc etc. 
> There are several options. We can hope that people
> somehow create XML
> namespaces that play well together, in the absence
> of such conventions.
> This hasn't happened yet, but there's always hope.
> Or we can try to 
> invent some such conventions within the AtomPub WG
> (add
> 11-24 months to the schedule; plus same again 2
> years later to fix
> mistakes), or we can back off from the 'Atom
> everywhere' rhetoric and 
> decide that Atom's really about interop in the
> blogging world, and 
> that full on data syndication is a v2.0 problem, and
> that v1.0 targets
> bloggers.
> When feeds are carrying rich namespace-extended
> descriptions of the things 
> the feeds describe (jobs, journals, products,
> holidays, pornography, people, 
> cities, mail messages, CVS servers, journal
> articles, paper-published 
> books, MP3s, Ogg streams, playlists, concert
> listings, weather reports, security
> alerts, blog comments, answerphone messages, bank
> transactions, network
> outages, TV schedules, dentist appointments,
> football results, product
> recalls, train times, blind dates, press releases
> and -yesyes- blog posts, 
> ... *then* we'll have 'atom everywhere'. But unless
> we're going to slip 
> back 5 years in terms of expressivity, Atom needs a
> way to allow all
> these kinds of thing to be described using whatever
> externally-managed
> namespaces make sense in the marketplace.

RSS 2.0 does all these and more today. It seems you
are either ignorant of what's happening in the
syndication space today or willfully ignoring the
current state of the syndication marketplace. 

> For externally managed namespaces to make sense when
> deployed together, 
> they need to be designed with that in mind. Which
> brings us back to 
> the frameworks on offer to folk creating those
> namespaces. The examples
> I gave above are mix. There's some blogging and
> information-resource use
> cases in there (though digital library stuff quickly
> shades off into
> complexity). There's a lot of things focussed around
> people, around
> places, and in particular around events. Hardly
> suprising; syndication
> is event-centric. Not blog posting events, but
> events in the world that 
> are associated in various (potentially nameable)
> ways with the information 
> items we syndicate in XML. The AtomPub WG isn't
> (AFAIK) in the 
> business of providing exhaustive descriptions of
> events, of places, 
> of people. It is, perhaps, in the business of
> providing a syndication 
> framework where richer descriptions of these things
> (and more) can be 
> mixed together. This doesn't mean that all Atom code
> needs to understand 
> them, or will magically become intelligent and able
> to act upon new markup. 
> Just that there could usefully be a few common
> patterns for mixed-namespace 
> markup which allow the ***huge*** task of describing
> all this stuff to 
> be divided up amongst parties who may never meet or
> even be working on their
> namespaces at the same time. (I like the OpenGALEN
> slogan here; "making the 
> impossible very difficult" ;)
> So we should always be thinking about how to divide
> up the work, ie.
> what can we say to people who want to contribute a
> better way to
> describe jobs, drawing upon existing work re skills
> description, topics,
> location? What to say to people who want to
> syndicate photo metadata,
> drawing upon lat/long markup, 'who is in this photo'
> markup, common
> nouns, EXIF fields, and so on. Do we encourage them
> to have anything in
> common with each other's efforts, or just say "use
> XML+namespaces, go
> invent some named elements and attributes and tell
> us what markup
> patterns you consider valid".

Reinventing the wheel is not necessarily a part of the
XML+namespaces approach. Weren't we all just
discussing Atom reusing the dcterms:* namespace
elements instead of reinventing similar concepts in
context of Atom? Similarly, can't I as easily come up
with my own RDF based syndication format that doesn't
use Dublin Core thus negating all the benefits you
claim one gets for free using RDF. 

It just seems you are advocating that vocabulary
designers not reinvent the wheel and reuse other
markup vocabularies where possible. Makes sense to me.

> If they go the vanilla XML+namespaces route, they
> get
> to pick some named XML elements and attributes, and
> say some stuff about
> which element combination patterns are allowed, and
> how they can be
> decorated with XML attributes. If they go the RDF
> route, they don't get
> asked which elements theirs can go inside, and which
> can go inside
> there, **because it isn't up to them**. RDF quite
> explicitly witholds that
> ability from the creators of a namespace, so that we
> don't force people
> to anticipate all future uses of their creation, or
> get into rigid and
> fragile versioning coalitions with owners related
> tagsets. (and no, RDF doesn't
> solve the namespace versioning problem, but it makes
> it approachable, or
> at least merely very hard).

You've placed restrictions on the XML+namespaces route
that actually don't exist in practice as witnessed by
the number of RSS extensions as well as their support
by numerous clients. 

> The RDF approach isn't trying to make data
> universally understandable in
> any fancy AI sense, just universally mixable. If I
> want to aggregate jobs
> data, I need to know a bit about Jobs-related
> namespaces. If I want a 
> really smart Jobs aggregator, I'll go and
> investigate namespaces that
> relate to places, to events/time, to skill and topic
> description, and to
> geography. And I'd be well-advised to create some
> nice tools that
> actually use that data, and go evangelise those
> extensions to parties
> who'll create enough feeds to get some adoption. No
> magic, just a bit of
> structure around a lot of hard work.

Again, possible with RSS 2.0 using XML+namespaces
based extensibility. It happens all the time today as
it is. 
> > I see this in RSS 2.0 as well. 
> RSS 2.0 says "how these namespaces get designed and
> how they play
> together isn't our problem". Which brings us back to
> the scoping
> question. If AtomPub's deliverable is really
> focussed around weblogging, 
> then maybe it's OK to say "we don't know yet; maybe
> in Version 2". But
> if Atom is to be marketed as the backbone for
> Web-based data
> syndication, establishing a framework that'll serve
> us for decades to
> come, then the "not our problem" approach to
> mixed-namespace design
> simply doesn't cut it.

Sure. I still await concrete, technical reasons
preferably with scenario-driven use cases or examples
that show how using RDF based extensibility somehow
buys one significantly more than using XML based
extensibility on a global, decentralized network like
the World Wide Web. 

No matter how well it would perform, I will never construct any sort of machinery which is completely indestructible except for one small and virtually inaccessible vulnerable spot.

Do you Yahoo!?
Y! Messenger - Communicate in real time. Download now.