[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Accidental and intentional atom:id collisions





On 18 May 2005, at 17:30, Antone Roundy wrote:
On Wednesday, May 18, 2005, at 09:12 AM, Henry Story wrote:

I supposed the surest way to make it impossible to fake the id, is to specify that
by dereferencing the id and doing a GET (whatever the correct method of doing that for the
protocol happens to be) one should be able to retrieve the entry.


The unlikelihood of anyone actually retrieving each entry to verify that it's accurate would make this method only trivially effective.

Perhaps not. There was a long thread a few months ago about how to deal with the growing
size of feeds. Using a restful method, one simple solution to the problem would have been to have
minimal feeds of the following form:


<feed>
    <id>http://a.com/feeda</is>
    <entry>
        <id>http://a.com/feeda/entry1</id>
        <updated>2005-01-01T05:00+00:00</updated>
    </entry>
    <entry>
        <id>http://a.com/feeda/entry2</id>
        <updated>2005-01-02T06:00+00:00</updated>
    </entry>
    <entry>
        <id>http://a.com/feeda/entry3</id>
        <updated>2005-01-03T07:00+00:00</updated>
    </entry>
</feed>

Clearly feeds like the above would only be useful if one dereferenced the entry ids.
A consumer of an atom feed would thus read a feed, then work out which of the
entries it had not yet seen or which had been modified, and then go and fetch the
content of those entries by dereferencing the id. In the above case this could be done
by doing a GET on <http://a.com/feeda/entry1> for example and retrieving


<entry>
<id>http://a.com/feeda/entry1</id>
<updated>2005-01-01T05:00+00:00</updated>
<title>I am the legitimate entry</title>
<content type="html">&lt;a href="https://a.com/ feedback.php"&gt;Use this form to submit your credit card number&lt;/ a&gt;</content>
</entry>


Of course the above saves a lot more bandwidth the larger the entry becomes.

An oddity this language introduces is that if the entries don't appear in the same Atom Feed Document, they apparently don't have to be treated as the same entry, but if they're aggregated together, they do. Going a little further, if I'm subscribed to both original feeds and an aggregation that includes both, then I have potentially four <entry>s with the same id. Which are the same entry and which aren't? Only the two in the aggregation, which really aren't, are required to be treated as the same.


I always thought that two entries with the same id should be treated as the same entry.
What makes you thing otherwise?



Consider the following three simplified feed documents (or any two of them, or just the last one):

// this is the legitimate one

Yes but now imagine the same thing written in the minimal format menitoned above:


<feed>
<id>http://a.com/feeda</id>
<entry>
<id>http://a.com/feeda/entry1</id>
<updated>2005-01-01T05:00+00:00</updated>
<title>I am the legitimate entry</title>
<content type="html">&lt;a href="https://a.com/ feedback.php"&gt;Use this form to submit your credit card number&lt;/a&gt;</content>
</entry>
</feed>

<!-- the original feed --> <feed> <id>http://a.com/feeda</id> <entry> <id>http://a.com/feeda/entry1</id> <updated>2005-01-01T05:00+00:00</updated> </entry> </feed>

// this one is phishing
<feed>
<id>http://b.com/feedb</id>
<entry>
<id>http://a.com/feeda/entry1</id>
<updated>2005-01-01T10:00+00:00</updated>
<title>I am the legitimate entry</title>
<content type="html">&lt;a href="https://b.com/ feedback.php"&gt;Use this form to submit your credit card number&lt;/a&gt;</content>
</entry>
</feed>

<!-- the phishing feed --> <feed> <id>http://a.com/feedb</id> <entry> <id>http://a.com/feeda/entry1</id> <updated>2005-01-01T05:00+00:00</updated> </entry> </feed>


// this is an aggregation
<feed>
<id>http://c.com/feedc</id>
<entry>
<source>
<id>http://a.com/feeda</id>
</source>
<id>http://a.com/feeda/entry1</id>
<updated>2005-01-01T05:00+00:00</updated>
<title>I am the legitimate entry</title>
<content type="html">&lt;a href="https://a.com/ feedback.php"&gt;Use this form to submit your credit card number&lt;/a&gt;</content>
</entry>
<entry>
<source>
<id>http://b.com/feedb</id>
</source>
<id>http://a.com/feeda/entry1</id>
<updated>2005-01-01T10:00+00:00</updated>
<title>I am the legitimate entry</title>
<content type="html">&lt;a href="https://b.com/ feedback.php"&gt;Use this form to submit your credit card number&lt;/a&gt;</content>
</entry>
</feed>


<!-- the aggregation feed -->
<feed>
    <id>http://a.com/feedc</id>
    <entry>
        <source>
            <id>http://a.com/feeda</id>
        </source>
        <id>http://a.com/feeda/entry1</id>
        <updated>2005-01-01T05:00+00:00</updated>
    </entry>
    <entry>
        <source>
            <id>http://a.com/feedb</id>
        </source>
        <id>http://a.com/feeda/entry1</id>
        <updated>2005-01-01T05:00+00:00</updated>
    </entry>
</feed>

Notice that the source of an entry is much less important in this situation - it
does not carry the heavy role of conveying trust. Trust is carried simply by the
structure of the http protocol.


Also it is clear that there is not much room for phishing here. The content of the
entry is retrieved in both cases by GETing <http://a.com/feeda/entry1>


So here we have a solution that:

This is already bad enough. Now how about if the phishing feed claims that it's atom:id is http://a.com/feeda. Worse still. With the current spec text, an Atom consumer that does a little extra work, somehow figures out that the phishing version of the entry is not the same as the legitimate version, and tells the user that would be violating the spec.

In the above case changing the id of a feed would not be very interesting. The content is always
fetched by dereferencing the atom id. So a phishing feed that faked its id would not achieve
very much.


But just to deal with this issue, let us go a little further then and make the id of a feed
also dereferenceable. If the id of a feed is the url at which its (head) can be fetched, then
there is no way to do much harm by faking the id of one's feed.



In conclusion a minimal feed format has the following advantages: - saves heavily on bandwidth (because of its RESTful nature) - removes the whole phishing problem - make it very easy to understand how to generate ids

So those are just a few thoughts on the topic. It just seems that if one works with
the web these phishing problems seem to disappear.


Henry Story