[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A simpler proposal: PaceAtomIDAsString




At 12:18 AM +0100 8/4/04, Graham Parks wrote:
I agree 100% that the id should always be treated as a string. My problem is that with no rules on format, a publisher can only guessing that your identifier is unique, which isn't a problem with say, tag URIs.

As has been said on this list before, publishers are "only guessing" that for any ID, including one that is a tag: URI.


At 1:30 AM +0200 8/4/04, Bjoern Hoehrmann wrote:
Hmm, it would make more sense to me to specify that the content must
be for example the uc(md5_hex()) of some reasonably unique data

That's good if you want opaque identifiers, but I haven't heard any need (or even desire!) for those so far.


 and
that implementations must not change the id without the user's consent,
specifically not if [common cases].

That wording would be true for any atom:id proposal. But it's kind of obvious, isn't it? Do we really need to state it?


Your proposal just obfuscates the issue as common
practise will likely be to use URIs anyway, which would likely yield
in all the problems we have when the spec says it is a more special
kind of string.

I'm not sure why you think it will be common practice. It's easy to do the right thing here and munge together a DNS name, a time, and a random string.


At 1:11 AM +0100 8/4/04, Bill de hÓra wrote:
According to the pace, it's not plain text for the rest of us, it's Unicode for the rest of us.

Those are *identical*.


If the comparison rules are to be character based over Unicode, that would suggest the Unicode encoding must also be specified so we know how many bytes are being used per character and so on.

Nope, exactly wrong. It is character-by-character, not byte-by-byte.


But:

"Even if a particular atom:id instance looks like a URI, it SHOULD NOT be treated as one."

doesn't seem consistent with the idea that we can have URIs if we want. The pace is saying (to me) something along these lines "if it looks like a HTTP URL you shouldn't run GET on it; if it looks like a NewsML URI you shouldn't infer from the versioning bits".

Correct.


We have experience that suggests people will not be able to oblige themselves to this kind of constraint. IMO it should be struck.

How do others feel about this? I thought the discussion trend was going towards "do not deference atom:ids".


May I suggest this wording for the time being:

" The "atom:id" element's content conveys a permanent, globally unique identifier for the feed. It MUST NOT change over time, even if the feed is relocated. An atom:head element MAY contain an atom:id element, but MUST NOT contain more than one. The content of this element, when present, MUST be a string of Unicode characters encoded as UTF-8. When atom:id elements are compared, they MUST be compared on a character-by-character basis.

I don't think that would not work in a document whose encoding is anything other than UTF-8.


It is not a goal that atom:id be usable for retrieval of information.

I'm OK with that wording.


--Paul Hoffman, Director
--Internet Mail Consortium