[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: One reason we have duplicates entries is that we have duplicate feeds...
On Apr 7, 2005, at 10:34 AM, Bob Wyman wrote:
Tim suggests that aggregators should be able to rely simply on
atom:id to detect duplicates. However, as has often been pointed out,
applying this rule in an intermediary like PubSub would simply make
PubSub a
marvelously efficient tool for denial of service attacks. I.e. if I
didn't
like something you published, I would simply publish something in my
blog
that had the same atom:id as something you had published. PubSub and
other
synthetic feed producers would then flush your post from the system and
replace it with my post... Not good -- and not avoidable given the
current
loose rules for defining instances of atom:id.
Yes. Mea culpa; somehow I'd missed this. Would the following work:
1. A new feed-level element <atom:alt-uri-prefix>, any number allowed.
E.g.
<feed>
<link>http://www.tbray.org/ongoing/</link>
<alt-uri-prefix>http://www.tbray.org/</alt-uri-prefix>
<alt-uri-prefix>http://tbray.org/</alt-uri-prefix>
<alt-uri-prefix>http://www.textuality.com</alt-uri-prefix
It says: an atom:entry can't be a duplicate of an atom:entry in this
feed unless it comes from a feed whose URI begins with one of these.
In conjunction with atom:id and atom:updated, it would solve most of
PubSub's problem, but note that it only solves the problem for one
level of aggregation. Which I suspect is a useful 80/20 point.
By the way... bear in mind that virtually all duplicates are coming via
things like Technorati and PubSub, so if we can figure out a fix, the
number of players who need to implement it is small, and the motivation
for them to do it is high. -Tim