[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: High volume feeds - one possible solution



S. Mike Dierken wrote:
> I like the approach of a set of resources that model a 'change list'
> rather than a single resource for 'the mondo feed'.
	Change lists, either monolithic or linked, are an appealing
concept and could be made to work. Unfortunately, I don't think they buy
us very much.
	We need to recognize that feeds range from the inactive which
might be updated once a week or maybe only once every few months, to the
very active. Very active feeds are those like blogs.msdn.com that is
updated on average every five minutes or the LiveJournal consolidated
feed that has as many as 400 updates every minute...
	1. If the change list is monolithic, i.e. a file with nothing
but pointers to other files, then, eventually the file becomes so large
that you have to start thinking about things like using RFC3229 to
retrieve deltas to the change list or you need to implement the change
list as a linked list. Given 400 updates per minute, the LiveJournal
"changelist" would rapidly grow difficult to handle.
	2. If a chained or linked approach is used, on active feeds, you
would end up having to request potentially dozens or hundreds of
additional entries every time you polled the feed. Even with HTTP
persistent connections, this will result in a great deal of wasted
bandwidth -- if only in passing the names of the entry files to fetch.
Also, such an approach requires quite a few file "opens" which, on many
systems, can be expensive. (Yes, I realize that a variety of in-memory
cache daemons exist and their use would reduce file i/o at the cost of
expensive main memory.)
	While I see that "change list" and "linked file" approaches can
work, I don't see them as superior to implementation of delta feeds ala
RFC3229. In any case, these approaches increase client complexity and do
not deliver benefits until clients actually adopt them. On the other
hand, Sam Ruby's "RFC3229 instance-manipulation on by default" approach
would result in immediate bandwidth utilization whether or not clients
were aware of the mechanism.

		bob wyman