[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Scaling the Web for Syndication [Blogging needs a whole new BAG...]
On May 18, 2004, at 3:42 PM, Werner Vogels wrote:
There is a very big difference between 'the web' and the
feed/aggregator
systems. The first is designed for and driven by human interaction.
How do you define "the Web"? The Web that I know definitely isn't just
for human interaction.
That said, I agree that there are differences between browser and
aggregator traffic that merit attention. In fact, I'd argue that
aggregator traffic has some properties that are much easier to scale
than browser traffic. From an old blog entry
[http://www.mnot.net/blog/2003/05/03/rss_traffic_characterisation]:
For example, Web traffic is well-known to be self-similar. That is, it
has the same "burstiness" no matter what time scale you look at it on;
the highs and lows in your traffic will look roughly the same no
matter if you look at a one minute snapshot vs. a one week snapshot. I
suspect this isn't true for RSS; the tendency for aggregators to poll
at a preset time (often, the top of the hour, or at hourly intervals
from startup) means that there's a huge, regular burst of traffic,
followed by a long period with sparse hits.
Azer Bestavros has written a good paper about self-similarity at:
http://www.cs.bu.edu/faculty/crovella/paper-archive/self-sim/paper.html
Since aggregators poll at pre-set intervals, their traffic is less
bursty on the smaller scales and more predictable as well; it should be
possible to take advantage of this to mitigate the potentially greater
number of requests made.
Indeed, services like Akamai get the bulk of their business not from
customers who don't have enough bandwidth for their typical load, but
rather those that don't have enough extra capacity to handle the
extremities of the burstiness they see; i.e., the dreaded "flash crowd"
phenomenon (a topic which has sustained many through graduate school
and launched more than one start-up). If the Web can be taught to cope
with that, why is it not amenable to the needs of a comparatively
simple and methodical beast, the regularly polling aggregator?
We large groups of new types of users coming online, it is not obvious
(at
least not to me) that it will scale 'just fine'
I agree that there may be the need for additional mechanisms and
refinement, especially in the caching infrastructure. This is exciting
work that I look forward to.
However, some people think of this as an excuse to rip out URIs, HTTP
and the rest so they can have a fresh start. I wish them the best of
luck in getting their new infrastructure deployed, adopted and mature,
but won't have any part in developing or promoting it; doing so is too
wasteful for my taste, and has a distinct feeling of change for
change's sake.
Cheers,
--
Mark Nottingham http://www.mnot.net/