[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Blogging needs a whole new BAG...



Title: Message
    Just as "the web" has a TAG (Technical Architecture Group) I think Blogging needs a BAG (Blogging Architecture Group.) I believe that the charter for either the W3C or IETF Atom group should include a task to form a BAG and produce a Blogging Architecture.
    I raised this issue at this morning's informal "W3C meets Atom" meeting in New York where Eric Miller presented his arguments for why W3C is interested in Atom. Although I fear that opening this issue risks derailing the current process, I think that we should carefully consider the entire architecture or ecosystem of blogging while making the decision of which standards group should host Atom standardization.
    The Atom format and API are really only the tip of the iceberg when it comes to considering the full "architecture" of the blogosphere. My personal feeling is that whatever standards group "gets" Atom, that group should be willing and able to show their willingness not only to deal with the immediate issues of finalizing and formalizing the format and API, but also to deal effectively with the many other elements of the full blogging architecture. It would be very difficult, I think, to have part of the blogging problem being worked on at W3C and other parts of blogging worked on at IETF without serious and effective coordination between the two organizations. If both organizations are to be involved, at least one should take on the responsibility of providing an overall vision and architecture for the general problem -- not just  the specific issues of format and API.
 
    Some of the issues that need to be dealt with are: (this is a very incomplete list...)
 
    * Push vs. Pull: Currently, blogging is all about pull and polling. However, it is clear to many that the current approach simply won't scale. We can't have millions of aggregators polling and pulling from millions of Atom files without introducing massive resource consumption issues. Blogging works today in large part because it simply isn't a very popular or well known system. Unfortunately, it appears that the resource (bandwidth, etc.) that is required to support blogging increases at a rate which is greater than linear in relationship to the number of blogs and bloggers... (i.e. every new blogger reads more than one other blog and as more blogs come on line people tend to increase the number of blogs they read...)
    Clearly, we need to consider providing real "publish/subscribe" support for blogging and thus for Atom data. We need to deal with the question of "Push" not "Pull." Unfortunately, in order to get around firewall and NAT issues, this probably implies that connection/session oriented protocols be used. Thus, we should be looking at pushing Atom data over protocols currently used for instant messaging (Jabber/XMPP, SIMPLE, etc.), P2P, or epidemic protocols. But, the W3C currently only supports session-less protocols (yes, HTTP 1.1 is a little weird) that aren't really suitable for notification and push. Also, there appear to be quite a few "RESTafarians" in the W3C who might not be willing to deal with non-REST protocols. The IETF, on the other hand, has been the home of many session-oriented protocols and expertise more suitable for addressing the Push problem. Will W3C be willing to support push-based Atom?
 
    * Pinging: Blogging benefits from the various "pinging protocols" which currently depend on XML-RPC (a non-W3C protocol). These pinging protocols need to be extended to take into consideration the requirements of Atom (at least we need to be able to distinguish between an Atom feed and an RSS feed) and we should consider if the richness of the protocol should be expanded. For instance, given Atom's atom:id, it might make sense to allow the pinger to explicitly state which entries have been added or modified. We should also consider allowing a ping to pass the title, a summary, or other metadata relevant to the update that generated the ping. If "Push" is supported, we need to consider if Pinging should remain a distinct function or, is it just a special case of "push"?
    Currently, Pinging relies on the pinging system deciding who should be pinged. In an alternative world, someone interested in receiving pings would be able to "subscribe" to such pings. This would require a protocol/API. How would such a protocol relate to WS-Eventing, WS-Events, or WS-Notification?
    Pings are currently "anonymous" and it is quite possible to spoof pings. Perhaps we should consider providing mechanisms for authenticating pingers as well as mechanisms for "signing" pings to establish their origin.
    Also, pinging is currently only used in the context of blogging, however, it would make sense for "normal" web sites to be able to ping as well. Currently, we have many crawlers wasting resources as they scour the web looking for changed sites. We could drastically improve the speed with which people discover new web content while reducing some of the resource costs if normal web sites were to ping central ping aggregators to inform them of updates in much the same way that blogs do.
 
    * Trackback: The "trackback protocol" is popular, provides benefit, etc. however, there are a number of major issues that should be addressed. These issues include the same authentication and signature issues that relate to pings. There are also important architectural issues that should be addressed in order to deal with the problem of trackback-spam. Is it possible/desirable to provide for "trackback aggregators" or trackback services on behalf of blogs? Is Trackback really just a special case of Pinging?
    Trackback is a generally useful function that should be supported outside the realm of blogging. A major "architectural" issue with the web has always been the fact that links are uni-directional. Explicit trackbacks are one way of providing the backlinks needed to construct bi-directional links. Presumably, bi-directional links would be generally useful. What, if any, modifications would need to be made to trackback in order to make the facility generally useful? Are there extensions to HTML that should be made in order to record "in-bound" links or to provide a means of pointing to where one would discover in-bound links?
 
    * OPML: Many blogging applications rely on OPML files as a means of recording and exchanging lists of blogs, metadata about them, etc. There is not, however, a well-accepted definition of how one builds an OPML file (although Danny Ayers, among others, has worked on this issue.) and OPML is not the subject of any serious standardization effort. If OPML use is to grow, we really must see some standardization here. Also, we're seeing the growth of applications that rely on the sharing of OPML files. Thus, do we need to extend the concepts of pinging, trackback, push, etc. to include API and update notification support to OPML file maintenance?
 
    * The role of proxies, synthetic feed generators, systems like PubSub.com, etc. is neither well understood nor defined, however, it is clear that there are today quite a number of these things and there will be more in the future. Effort should be put into at least clarifying the issues related to these intermediary processors and distributors of Atom files.
 
    * Comment support is provided by many blogs and we're beginning to see issues related to comment spamming, etc. Is there something we can do from an architectural point of view to prevent comment spamming? (i.e. do we need something like an IRTF spamming group?) How do we handle remote commenting and what is the place/role of services like SixApart's "TypeKey" service? Should a "TypeKey" standard be developed? How does the W3C Annotea effort relate to commenting and/or blogs in general?
 
    * HTTP has limitations that are a true burden in blogging. For instance, there is no server-side support for identifying and retrieving of "fragments" of a resource served by an HTTP server. Thus, you can't say things like "Give me only the entries that have been updated since time XXXX'". Should HTTP be extended to address better the needs of Atom? Should RFC-3229 be extended to define an ATOM specific mechanism for retrieving Atom Fragments?
 
    * As Atom and blogging becomes more popular, there will be some sites that generate large quantities of Atom formatted data and others that will consume large quantities of it. This raises the issue of binary or compressed formats for Atom data.
 
    * Many commercial publishers have expressed an interest in syndication but also express great concern about the various issues related to IPR and usage rights. Is there anything we can do to provide a means to support digital rights management in Atom feeds or systems?
 
    * PICS (Platform for Internet Content Selection) provides a means to tag content to facilitate content selection, rating services, filtering, signing and privacy. We need to consider how PICS relates to the world of blogging.
 
    * P3P (Platform for Privacy Preferences) was designed with web sites in mind. Does P3P work for the kind of systems that would be developed using Atom, the Atom API, and other blogging components? Does P3P need to be extended to address blogging related issues?
 
    * Various efforts have been made to support "categories" as meta-data in blogs and blog entries. Most have failed. It seems like these efforts should be formally linked to similar efforts such as the ISO Topic Map or the XTM standards. Ideally, any solution for blogs would also work well in non-blog environments. Do we need standardization of things like "subject indicators" or do the existing standards do the job?
 
    These are just some of the issues that rise out of the general problem of blogging. Frankly, I think what is really going on here is that in the blogging world, we're experimenting with and defining much more than just blogs. What we're really doing is working out the details of a whole new way of interacting with the web. Now, as we are passing from the stage of initial experimentation to a stage of consolidation behind an initial set of formal standards, I think we need to expand our vision to include the full range of issues that have been raised. We also need to think about how the web-at-large can benefit from what we've learned in the laboratory of blogging.
    As said before, blogging is much more than just a file format and an API. Atom and its API should only be considered the first of many steps in defining how blogging, and this style of web use, are to be pursued in the future. Whatever forum we use to define the first steps on this road should be a forum which will be appropriate for resolution and discussion of the other issues as well.
    Is the W3C willing and able to deal with more than just the immediate issue of format and API? Is the W3C willing, able, and the best forum to deal with the overall problem of bloggging?
 
        bob wyman