[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Semantics and Belief contexts - was: PaceDuplicateIdsEntryOrigin posted




On 25 May 2005, at 21:06, Antone Roundy wrote:

* The accepted language does not speak of the origin feed of the entries. Ideally, an atom:id should be univerally unique to one entry resource, and we rightly require publishers to mint them with that goal. However, in reality, malicious or undereducted publishers might duplicate the IDs of others. Therefore, it is proposed to modify the specification to state that the atom:entry elements describe the same entry (resource) if they originate in the same feed.
* Aggregators wishing to protect against DOS attacks are not unlikely to perform some sort of safety checks to detect malicious atom:id duplication, regardless of whether the specification "authorizes" them to or not.


I understand your motivation, but I think it is misguided. I only recently understood why myself [1].

Let me explain a little how I come to this conclusion. An easy way to understand semantics is to think of it as about the objects out there. Take the sentences:

    (a) Superman can fly
    (b) Superman is Clark Kent

we can immediately deduce truly that

(c) Clark Kent can fly.

Since the referents of "Superman" and "Clark Kent" are the same, what is true of the one,
is true of the other. When speaking directly about the world, we can replace any occurrence
of Superman with Clark Kent, and still say something true.


When we are speaking about what others believe, this is no longer true. Lois Lane may believe (a) without believing (c). She may think Superman is a hero, but not think that Clark Kent is one. There
is in logic therefore a fundamental distinction between sentences used in a direct semantic way, and
sentences used in this indirect way, when the sentence is in a belief context. This distinction is
so fundamental that there is a well known mental illness that goes with people who are not able to
make this distinction: autism. Autistic children have great difficulty understanding the difference
between what is and how people perceive things to be.


In RDF this distinction shows up when moving from triples to 4- tuples. RDF/XML is a language
that works best in the Semantic realm. With triples we can describe objects and their
relationships. If we want speak about consistent ways of seeing the world we need to group statements with formulae as is done in N3[2] and TriX for example. This then allows us to name
consistent sets of statements. It also allows one to simultaneously refer to sets that are inconsistent. I can for example consistently hold the following:


Lois lane believes that Superman is different from Clark Kent
Clark Kent believes that he is Superman.

without contradiction.

So how does this relate to Atom? Well we need to be clear that semantically a entryId and
a feedId point to one thing and one thing only. But this does not mean that there can not
be erroneous, false, corrupted,... feeds out there. Aggregators wishing to protect against
DOS attacks should simply do what we humans do in such circumstances, namely quote what others
are saying and not assert the things others are saying. This is why the proposal by Roy
Fielding to allow feeds inside of feeds was probably the best way to do things (I just came
to this conclusion yesterday, before this I had no idea what he was going on about).


So to prevent a DOS attack, best is to have aggregator feeds such as:

<feed>
    <!-- aggregator feed -->
    <feed src="http://true.org";>
       <id>tag://true.org,2005/feed1</id>
       <entry>
          <title>Enter your credit card number here</title>
          ...
       </entry>
     </feed>
    <feed src="http://false.org";>
       <id>tag://true.org,2005/feed1</id>
       <entry>
          <title>Enter your credit card number here</title>
          ...
       </entry>
    </feed>
</feed>

Here all the aggregator feed is claiming is that he has seen entries inside other
feeds. He never need claim to agree with any of their content. And so the content
of the first internal feed and the second internal feed can be contradictory. They
can for example have the same id with the same updated timestamp and with different
content.


It will be up to the consumer of such aggregated feeds to decide which to trust.

The good thing about this way of doing things is that one can define a first level feed
in a simple semantic vocabulary, without needing to create all kinds of exceptional
clauses all over the place. When dealing with feeds inside a feed one can then
simply mention that this indirection is equivalent to the belief context indirection.
Statements can be contradictory across such internal feeds.


Taking this into account should help make the spec a lot cleaner, easier to write and
easier to understand. The problems are fundamental, so they cannot be swept under the
carpet. They will keep popping up.


Henry Story

[1] http://www.imc.org/atom-syntax/mail-archive/msg15608.html
[2] http://www.w3.org/DesignIssues/Notation3.html