[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PaceFeedEquivalence



On Fri, 30 Jul 2004 11:45:49 -0400, Joe Gregorio <joe.gregorio@xxxxxxxxx> wrote:
> What kind of
> URI normalizations should be used when comparing
> atom:id URIs for equivalence?

All of them except protocol normalization, which requires resource
retrieval to verify equivalence.

> "Case Normalization"

See http://feedvalidator.org/testcases/atom/must/entry_id_duplicate_value_5.xml

> "Percent-Encoding Normalization"

See http://feedvalidator.org/testcases/atom/must/entry_id_duplicate_value_6.xml
and http://feedvalidator.org/testcases/atom/must/entry_id_duplicate_value_7.xml

> "Path Segment Normalization"

See http://feedvalidator.org/testcases/atom/must/entry_id_duplicate_value_2.xml

> "Scheme-based Normalization"

See http://feedvalidator.org/testcases/atom/must/entry_id_duplicate_value_3.xml
and http://feedvalidator.org/testcases/atom/must/entry_id_duplicate_value_4.xml

> "Protocol-based Normalization"

No, because this requires retrieval.

Also, I wonder about
http://feedvalidator.org/testcases/atom/must/entry_id_duplicate_value_8.xml
where two URIs differ only by Unicode normalization form after being
un-percent-encoded.  Are these two URIs the same or different?

> That is for the *consumer* side of
> atom:id. On the producer side should
> we suggest/require the canonical form
> of URIs[1]?

I can definitely see us adding informational messages to the feed
validator to check for variations of these.

> And if so should we
> augment that canonicalization
> to include a unicode normalization form?

An excellent question.  Tim?

-- 
Cheers,
-Mark