[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Letter from Planet Web on identifiers
I disagree with the direction Atom's going on links and identifiers.
This note is
(a) to explain why I think this notion of atom:id for atom:entry is
misguided and unhelpful, and
(b) to acknowledge that it's probably not actively harmful and say
"let's move on", and
(c) to ask for specific wording change in the description of link and
In this discussion, I should start by acknowledging my bias: I'm a Web
guy. The Web has been at the centre of my professional life for ten
years now, and I think that people who are designing protocols for
global-scale information systems should try to understand the Web's
lessons. You don't have to make all the same design choices, but you
really should understand why it works the way it does so you can
understand the cost and benefits of differing choices.
1. How the Web Actually Works
On Feb 28, 2004, at 12:14 AM, Roger B. wrote:
One of those reasons, by the way,
is the trifling fact that substantially all of the existing Web
software that actually works - browsers, caches, servers, spiders, all
that boring stuff - is currently built around that assumption.
an assumption that flies in the face of the dynamic Web. :)
cookies, and sessions effectively nuke any notion that an http: URI
can accurately identify a resource.
I don't have the time to do code walk-throughs on the crawlers and
indexers and caches and middleware that make the Web actually useful;
but the people who wrote them universally believe that Web identity is
what goes with a URI and nothing else. For some gnarly details on how
real software tries to guess whether or not two URIs are actually the
[Oh, with respect to Roger's note: query-strings are a string of
characters that make up part of a URI and are opaque to all the
middleware; cookies do muddy the waters but have the virtue that
middleware can safely ignore them; and the Web doesn't do sessions, it
uses stateless protocols].
And, by the way, ever since about fifteen minutes after the dawn of the
Web, voices have been raised in horror at this conflation of naming and
addressing, and saying There Must Be A Better Way; so this is not a new
issue. More on that below.
2. Why I Dislike atom:id
I asked on a couple of occasions what the real applications for atom:id
were, and here are the answers I got:
(a) I move my weblog somewhere else; with atom:id, the first time
aggregator visits the new site they don't see old postings as new.
(b) My software decides to change the way it assigns URIs to items;
the first time someone's aggregator visits the changed site they
don't see old
postings as new.
(c) I publish the same article in two different locations, and decide
two copies with different URIs rather than just point twice to the
With atom:id, the aggregator doesn't see this twice.
(d) Something gets passed through several levels of syndication, e.g. a
Reuters story, and various users decide to make their own URIs for
atom:id, the aggregator sees each once.
In my opinion, each of (a), (b), and (c) are bad, poorly-thought-out,
Web-hostile practices. They decrease the usefulness of search engines
and caches and bookmark facilities, i.e. of the Web as a whole.
Furthermore, it seems that in each case, the benefit (a one-time
avoidance of the sight of duplicate links) seems pretty minor. I'm
still baffled when people say "I need to have multiple URIs so I can
publish in multiple categories". Er... pointers? Computer programmers
have been doing call-by-reference for some decades now, so I don't get
By the way, I too see dupes in my RSS feed, and in virtually every case
I can trace them to incompetence or stupidity in the CMS or production
system upstream, and I have my doubts that people who already can't
manage to detect dupes are going to be helped by an atom:id in the
Furthermore, I'm nervous about these use-cases where the author seems
to think the item in two different places is the same item. For
example, I asked Norm Walsh why he wanted to do this and he wrote:
Because I want to provide different CSS or different navigation links
depending on which site you read them from. In other words, I want the
essays to "fit in" with the context in which they are presented and I
want to present them in two different contexts.
So Norm, you're changing the styles (which BTW can suppress whole
<div>s) and the links (a defining part of web content) and the context
and you're really sure I should still think these are the same thing?
Think you could leave that choice up to me?
The strongest remaining use-case I see is the
Reuters-article-in-two-places case, and it still doesn't seem very
strong to me. To start with, for general news I'd rather go through
something like Topix or Google that is going to take care of that
anyhow. If I actually did subscribe to a bunch of general-purpose
newsfeeds I'd probably be interested in seeing who picks up which
stories (The same article appearing in the New York Times and the
National Review is just not the same thing unless you're politically
illiterate). Hold on, I *do* subscribe to a bunch of general-purpose
newsfeeds and I just don't have this problem. I *do* have the problem
with idiotic publishers republishing the same content time after time
under different URIs rather than just bumping the modification date.
So I'm unconvinced that (a) there's actually a problem here that
requires inventing a whole new level of identity semantics and (b) if
there were, atom:id would help.
So, from where I sit, atom:id is going to provide a very moderate
benefit while encouraging Web-hostile bad publishing practices. So I
don't think it should be in Atom.
3. Why It's OK
As I said, ever since the URL/URI notion came along, lots of people
have objected (as we've seen in this list), saying "mixing up naming
and addressing is just wrong." I have a lot of sympathy with this
point of view. Unfortunately it leads nowhere. Phil Karlton, one of
the smartest programmers I ever met, said "there are only two hard
problems in Computer Science: naming and cache invalidation." URIs are
the first global-scale data naming system that has ever worked in the
slightest. It isn't perfect but it kind of limps along.
So what people do about this is they think "we're smarter than those
Web morons, we'll do identifiers right!" And they go off and found an
IETF working group (lots of those corpses litter the landscape) or they
invent tag: or URNs or doi: or atom:id or whatever, under the
assumption that naming is a technical problem.
History shows that doing names properly is a management problem, not a
technical problem. An organization that is competently managed will
make the effort to ensure that their identifiers are persistent,
stable, or available. An organization that is stupidly managed will
screw this up even if they're using URNs.
And after all these years, the only URL alternative that's getting
reasonably widespread deployment (that I know of) is the WebDAV URI
schemes, which are aimed at solving a different class of problems.
Last time I checked, my well-equipped computer here has no software
that can do anything useful with URNs or tag: or doi: or any of the
So, it's OK to have atom:id. Organizations that publish feeds, if
they're competent, will eventually realize that their stuff will be
more accessible and more popular and more influential if they don't
fuck with their URIs and don't get in the way of Google and Akamai and
all the other middleware doing their work properly. So they'll just
let the URI be the identifier and the identifier be the URI and get on
I'll do this formally on the Wiki and post here again. But I want to
change the wording of the description of link and id as follows
4.13.2: The "atom:link" element is the URI for this entry, seen as a
Web Resource. An entry must have one and exactly one atom:link.
[End of story. I totally want to get rid of the "alternate" stuff,
which is a "wouldn't this be nice" feature].
4.13.2: The "atom:id" element is an assertion of globally unique
identity. That is to say, if two different entries (in the same feed
or different feeds) have the same atom:id, this constitutes an
assertion that the two entries are the same.
[Recognizing that "identity" is often a matter of opinion and context.
And it worries me that the atom:id isn't attached to some provider so I
can ask "who said so?"].
Description: S/MIME cryptographic signature