[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Scaling the Web for Syndication [Blogging needs a whole new BAG...]





On May 18, 2004, at 3:42 PM, Werner Vogels wrote:

There is a very big difference between 'the web' and the feed/aggregator
systems. The first is designed for and driven by human interaction.

How do you define "the Web"? The Web that I know definitely isn't just for human interaction.


That said, I agree that there are differences between browser and aggregator traffic that merit attention. In fact, I'd argue that aggregator traffic has some properties that are much easier to scale than browser traffic. From an old blog entry [http://www.mnot.net/blog/2003/05/03/rss_traffic_characterisation]:

For example, Web traffic is well-known to be self-similar. That is, it has the same "burstiness" no matter what time scale you look at it on; the highs and lows in your traffic will look roughly the same no matter if you look at a one minute snapshot vs. a one week snapshot. I suspect this isn't true for RSS; the tendency for aggregators to poll at a preset time (often, the top of the hour, or at hourly intervals from startup) means that there's a huge, regular burst of traffic, followed by a long period with sparse hits.

Azer Bestavros has written a good paper about self-similarity at:
http://www.cs.bu.edu/faculty/crovella/paper-archive/self-sim/paper.html


Since aggregators poll at pre-set intervals, their traffic is less bursty on the smaller scales and more predictable as well; it should be possible to take advantage of this to mitigate the potentially greater number of requests made.

Indeed, services like Akamai get the bulk of their business not from customers who don't have enough bandwidth for their typical load, but rather those that don't have enough extra capacity to handle the extremities of the burstiness they see; i.e., the dreaded "flash crowd" phenomenon (a topic which has sustained many through graduate school and launched more than one start-up). If the Web can be taught to cope with that, why is it not amenable to the needs of a comparatively simple and methodical beast, the regularly polling aggregator?


We large groups of new types of users coming online, it is not obvious (at
least not to me) that it will scale 'just fine'

I agree that there may be the need for additional mechanisms and refinement, especially in the caching infrastructure. This is exciting work that I look forward to.


However, some people think of this as an excuse to rip out URIs, HTTP and the rest so they can have a fresh start. I wish them the best of luck in getting their new infrastructure deployed, adopted and mature, but won't have any part in developing or promoting it; doing so is too wasteful for my taste, and has a distinct feeling of change for change's sake.

Cheers,

--
Mark Nottingham     http://www.mnot.net/