[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Blogging needs a whole new BAG...




Bob Wyman wrote:
Just as "the web" has a TAG (Technical Architecture Group) I think Blogging needs a BAG (Blogging Architecture Group.) I believe that the charter for either the W3C or IETF Atom group should include a task to form a BAG and produce a Blogging Architecture.

Too much to take on; it'll never get done. It'll be enough work to get Atom nailed within scope for year's end 2005.



Some of the issues that need to be dealt with are: (this is a very incomplete list...)
* Push vs. Pull: Currently, blogging is all about pull and polling. However, it is clear to many that the current approach simply won't scale.

I agree with Tim Bray - that engineers can keep things on check, with a network of CDNs and better behaved clients and the like. But - who pays for that? What's the incentive? This ain't the nineties. XMLPP, NNTP, SMTP, SMS, Bitorrent are all options if anyone wanted to deploy the client tools in anger. On the other hand I'm not even sure feed traffic is a good fit for CDNs as deployed - perhaps Mark Nottingham has thoughts. But it does seem to me that punishing the popular becomes an even bigger problem with syndication.



We can't have millions of aggregators polling and pulling from millions of Atom files without introducing massive resource consumption issues.

True, but see above. It is doable, and the resources are available to be punished.



Clearly, we need to consider providing real "publish/subscribe" support for blogging and thus for Atom data.

We only need to consider it. But I would think unless something funny is going to happen with linking/identity then Atom can float above the transportation issues for the most part. Why is Atom affected if I get over SMS instead of HTTP?



We need to deal with the question of "Push" not "Pull." Unfortunately, in order to get around firewall and NAT issues, this probably implies that connection/session oriented protocols be used.

I really would wait this out for at least 12 months. Syndication is not the only technolgy with this issue. Who knows maybe now the P2P folks have a real-world problem to solve.



* Trackback: The "trackback protocol" is popular, provides benefit, etc. however, there are a number of major issues that should be addressed. These issues include the same authentication and signature issues that relate to pings. There are also important architectural issues that should be addressed in order to deal with the problem of trackback-spam. Is it possible/desirable to provide for "trackback aggregators" or trackback services on behalf of blogs? Is Trackback really just a special case of Pinging?

I don't see any difference. Really they're both subsumed by backlinking - that is they're dynamic generated backlinks. The web has a lot of history around that subject as you allude to and Tim Bray can attest to directly.



* OPML: Many blogging applications rely on OPML files as a means of recording and exchanging lists of blogs, metadata about them, etc. There is not, however, a well-accepted definition of how one builds an OPML file (although Danny Ayers, among others, has worked on this issue.) and OPML is not the subject of any serious standardization effort. If OPML use is to grow, we really must see some standardization here.

I would let OPML grow some more before nailing it down. Yes, it's an interop joke, but so are many other things.



* The role of proxies, synthetic feed generators, systems like PubSub.com, etc. is neither well understood nor defined, however, it is clear that there are today quite a number of these things and there will be more in the future. Effort should be put into at least clarifying the issues related to these intermediary processors and distributors of Atom files.

As you say this stuff is not well defined, but a standards effort is premature don't you think?



* Comment support is provided by many blogs and we're beginning to see issues related to comment spamming, etc. Is there something we can do from an architectural point of view to prevent comment spamming? (i.e. do we need something like an IRTF spamming group?) How do we handle remote commenting and what is the place/role of services like SixApart's "TypeKey" service? Should a "TypeKey" standard be developed? How does the W3C Annotea effort relate to commenting and/or blogs in general?

Many spam problems evaporate with usable crypto/pki/dsig. That most no-one signs anything but most everyone bitches about spam is not an Atom problem imvho.



* HTTP has limitations that are a true burden in blogging. For instance, there is no server-side support for identifying and retrieving of "fragments" of a resource served by an HTTP server. Thus, you can't say things like "Give me only the entries that have been updated since time XXXX'". Should HTTP be extended to address better the needs of Atom? Should RFC-3229 be extended to define an ATOM specific mechanism for retrieving Atom Fragments?

That would be - "we need a query language 'cos the XPath hacks don't cut it anymore". Again Atom is not the only technology that has a querying issue - this by the way is one area where the semweb folks would have very useful contributions. They've been thinking about this stuff for years.




* As Atom and blogging becomes more popular, there will be some sites that generate large quantities of Atom formatted data and others that will consume large quantities of it. This raises the issue of binary or compressed formats for Atom data.

Bob. tut-tut.



* Many commercial publishers have expressed an interest in syndication but also express great concern about the various issues related to IPR and usage rights. Is there anything we can do to provide a means to support digital rights management in Atom feeds or systems?

The basis for that would be to clearly specify c14n for Atom. That's step one. Once that's down other specs can follow. This is one area where sticking to raw XML is a good thing.



* PICS [...] P3P

This strikes me as area where client tools innovation is desperately needed. Allowing people to express their content filters is primarily a usabilty issue. Most people can't/won't use basic email filters never mind this stuff. My point is that it doesn't matter what's specified when people are unable to use it. I would vote to let the Bayesians handle this.



* Various efforts have been made to support "categories" as meta-data in blogs and blog entries. Most have failed. It seems like these efforts should be formally linked to similar efforts such as the ISO Topic Map or the XTM standards. Ideally, any solution for blogs would also work well in non-blog environments. Do we need standardization of things like "subject indicators" or do the existing standards do the job?

A mapping into RDF is a charter option (rather than the RSS1.0 instaparse approach). We'll provide a mapping for Atom one way or another. Danny has been working with OWL lately - Oh well, I've been to busy to contribute anything to that efort recently, so I guess that's where it's heading.


cheers
Bill