[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Blogging needs a whole new BAG...
Title: Message
Just as "the web" has a TAG (Technical Architecture Group) I think Blogging
needs a BAG (Blogging Architecture Group.) I believe that the charter for either
the W3C or IETF Atom group should include a task to form a BAG and produce a
Blogging Architecture.
I
raised this issue at this morning's informal "W3C meets Atom" meeting in New
York where Eric Miller presented his arguments for why W3C is interested in
Atom. Although I fear that opening this issue risks derailing the current
process, I think that we should carefully consider the entire architecture or
ecosystem of blogging while making the decision of which standards group should
host Atom standardization.
The Atom format and API are really only the tip of the iceberg when it
comes to considering the full "architecture" of the blogosphere. My personal
feeling is that whatever standards group "gets" Atom, that group should be
willing and able to show their willingness not only to deal with the immediate
issues of finalizing and formalizing the format and API, but also to deal
effectively with the many other elements of the full blogging architecture. It
would be very difficult, I think, to have part of the blogging problem being
worked on at W3C and other parts of blogging worked on at IETF without serious
and effective coordination between the two organizations. If both organizations
are to be involved, at least one should take on the responsibility of providing
an overall vision and architecture for the general problem -- not just the
specific issues of format and API.
Some of the issues that need to be dealt with are: (this is a
very incomplete list...)
*
Push vs. Pull: Currently, blogging is all about pull and polling. However, it is
clear to many that the current approach simply won't scale. We can't have
millions of aggregators polling and pulling from millions of Atom files without
introducing massive resource consumption issues. Blogging works today in large
part because it simply isn't a very popular or well known system. Unfortunately,
it appears that the resource (bandwidth, etc.) that is required to support
blogging increases at a rate which is greater than linear in relationship to the
number of blogs and bloggers... (i.e. every new blogger reads more than one
other blog and as more blogs come on line people tend to increase the number of
blogs they read...)
Clearly, we need to consider providing real "publish/subscribe" support
for blogging and thus for Atom data. We need to deal with the question of "Push"
not "Pull." Unfortunately, in order to get around firewall and NAT issues, this
probably implies that connection/session oriented protocols be used. Thus, we
should be looking at pushing Atom data over protocols currently used for instant
messaging (Jabber/XMPP, SIMPLE, etc.), P2P, or epidemic protocols. But, the
W3C currently only supports session-less protocols (yes, HTTP 1.1 is a little
weird) that aren't really suitable for notification and push. Also, there appear
to be quite a few "RESTafarians" in the W3C who might not be willing to deal
with non-REST protocols. The IETF, on the other hand, has been the home of many
session-oriented protocols and expertise more suitable for addressing the Push
problem. Will W3C be willing to support push-based
Atom?
*
Pinging: Blogging benefits from the various "pinging protocols" which currently
depend on XML-RPC (a non-W3C protocol). These pinging protocols need to be
extended to take into consideration the requirements of Atom (at least we need
to be able to distinguish between an Atom feed and an RSS feed) and we should
consider if the richness of the protocol should be expanded. For instance, given
Atom's atom:id, it might make sense to allow the pinger to explicitly state
which entries have been added or modified. We should also consider allowing a
ping to pass the title, a summary, or other metadata relevant to the update that
generated the ping. If "Push" is supported, we need to consider if Pinging
should remain a distinct function or, is it just a special case of
"push"?
Currently, Pinging relies on the pinging system deciding who should be
pinged. In an alternative world, someone interested in receiving pings would be
able to "subscribe" to such pings. This would require a protocol/API. How would
such a protocol relate to WS-Eventing, WS-Events, or
WS-Notification?
Pings are currently "anonymous" and it is quite possible to spoof pings.
Perhaps we should consider providing mechanisms for authenticating pingers as
well as mechanisms for "signing" pings to establish their
origin.
Also, pinging is currently only used in the context of blogging, however,
it would make sense for "normal" web sites to be able to ping as well.
Currently, we have many crawlers wasting resources as they scour the web looking
for changed sites. We could drastically improve the speed with which people
discover new web content while reducing some of the resource costs if normal web
sites were to ping central ping aggregators to inform them of updates
in much the same way that blogs do.
*
Trackback: The "trackback protocol" is popular, provides benefit, etc. however,
there are a number of major issues that should be addressed. These issues
include the same authentication and signature issues that relate to pings. There
are also important architectural issues that should be addressed in order to
deal with the problem of trackback-spam. Is it possible/desirable to provide for
"trackback aggregators" or trackback services on behalf of blogs? Is Trackback
really just a special case of Pinging?
Trackback is a generally useful function that should be supported outside
the realm of blogging. A major "architectural" issue with the web has always
been the fact that links are uni-directional. Explicit trackbacks are one way of
providing the backlinks needed to construct bi-directional links. Presumably,
bi-directional links would be generally useful. What, if any, modifications
would need to be made to trackback in order to make the facility generally
useful? Are there extensions to HTML that should be made in order to record
"in-bound" links or to provide a means of pointing to where one would discover
in-bound links?
*
OPML: Many blogging applications rely on OPML files as a means of recording and
exchanging lists of blogs, metadata about them, etc. There is not, however, a
well-accepted definition of how one builds an OPML file (although Danny Ayers,
among others, has worked on this issue.) and OPML is not the subject of any
serious standardization effort. If OPML use is to grow, we really must see some
standardization here. Also, we're seeing the growth of applications that rely on
the sharing of OPML files. Thus, do we need to extend the concepts of pinging,
trackback, push, etc. to include API and update notification support to OPML
file maintenance?
*
The role of proxies, synthetic feed generators, systems like PubSub.com, etc.
is neither well understood nor defined, however, it is clear that there are
today quite a number of these things and there will be more in the future.
Effort should be put into at least clarifying the issues related to these
intermediary processors and distributors of Atom files.
*
Comment support is provided by many blogs and we're beginning to see issues
related to comment spamming, etc. Is there something we can do from an
architectural point of view to prevent comment spamming? (i.e. do we need
something like an IRTF spamming group?) How do we handle remote commenting and
what is the place/role of services like SixApart's "TypeKey" service? Should a
"TypeKey" standard be developed? How does the W3C Annotea effort relate to
commenting and/or blogs in general?
*
HTTP has limitations that are a true burden in blogging. For instance, there is
no server-side support for identifying and retrieving of "fragments" of a
resource served by an HTTP server. Thus, you can't say things like "Give me only
the entries that have been updated since time XXXX'". Should HTTP be extended to
address better the needs of Atom? Should RFC-3229 be extended to define an ATOM
specific mechanism for retrieving Atom Fragments?
*
As Atom and blogging becomes more popular, there will be some sites that
generate large quantities of Atom formatted data and others that will consume
large quantities of it. This raises the issue of binary or compressed formats
for Atom data.
*
Many commercial publishers have expressed an interest in syndication but also
express great concern about the various issues related to IPR and usage rights.
Is there anything we can do to provide a means to support digital rights
management in Atom feeds or systems?
*
PICS (Platform for Internet Content Selection) provides a means to tag content
to facilitate content selection, rating services, filtering, signing and
privacy. We need to consider how PICS relates to the world of
blogging.
* P3P (Platform for Privacy Preferences) was designed with web sites in
mind. Does P3P work for the kind of systems that would be developed
using Atom, the Atom API, and other blogging components? Does P3P need
to be extended to address blogging related issues?
* Various efforts have been made to support "categories" as
meta-data in blogs and blog entries. Most have failed. It seems like these
efforts should be formally linked to similar efforts such as the ISO
Topic Map or the XTM standards. Ideally, any solution for blogs would also work
well in non-blog environments. Do we need standardization of things like
"subject indicators" or do the existing standards do the
job?
These are just some of the issues that rise out of the general problem of
blogging. Frankly, I think what is really going on here is that in the blogging
world, we're experimenting with and defining much more than just blogs. What
we're really doing is working out the details of a whole new way of interacting
with the web. Now, as we are passing from the stage of initial experimentation
to a stage of consolidation behind an initial set of formal standards, I think
we need to expand our vision to include the full range of issues that have been
raised. We also need to think about how the web-at-large can benefit from what
we've learned in the laboratory of blogging.
As said before, blogging is much more than just a file format and an API.
Atom and its API should only be considered the first of many steps in defining
how blogging, and this style of web use, are to be pursued in the future.
Whatever forum we use to define the first steps on this road should be a forum
which will be appropriate for resolution and discussion of the other issues as
well.
Is the W3C willing and able to deal with more than just the immediate
issue of format and API? Is the W3C willing, able, and the best forum to deal
with the overall problem of bloggging?
bob wyman