[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PaceFeedEquivalence




On Jul 30, 2004, at 10:21 AM, Mark Pilgrim wrote:


On Fri, 30 Jul 2004 11:45:49 -0400, Joe Gregorio <joe.gregorio@xxxxxxxxx> wrote:
What kind of
URI normalizations should be used when comparing
atom:id URIs for equivalence?

All of them except protocol normalization, which requires resource retrieval to verify equivalence.

Er, by "should" do you mean SHOULD? I think it's OK for software to do simple string comparison - particularly resource-challenged implementations - and I think it's also OK to do all the other things outlined. I think that we should document the fact that those who mint and copy URIs can't count on other software doing anything more than a string comparison, so, to use the vernacular,
(a) Always use the same code when you're generating your own URIs, and
(b) don't fuck with the URIs you're given, store and pass them on the way you got them.


Given that, shit happens, which is why (in particular) robots will go to great lengths to determine when to different-looking URIs are actually the same.

And if so should we
augment that canonicalization
to include a unicode normalization form?

An excellent question. Tim?

As above. Don't require it, but don't try to forbid it either. -Tim