[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: URI canonicalization




At 14:27 05/02/01, Roy T. Fielding wrote: > >On Jan 31, 2005, at 7:10 PM, Martin Duerst wrote: >> 5) Add a note saying something like "Comparison functions >> provided by many URI classes/implementations make additional >> assumptions about equality that are not true for Identity >> Constructs. Atom processors therefore should use simple >> string functions for comparing Identity Constructs." >> I think such a note could be a good balance to the normalization >> advice. > >That would be a falsehood. Identifiers are not subject to >"simplification" -- they are either equivalent or not. We can >add all of the implementation requirements we like to prevent >software from detecting false negatives, but that doesn't change >the fact that equivalent identifiers always identify the same >resource. It is the author's responsibility to use URIs >(or IRIs) that are actually different, not the responsibility >of the protocol or implementation.

It's okay for some network-oriented usage to work that way.
The network can fail any moment, anyway. But it's not helpful
at all in some contexts that are not network-oriented. A very
good case in point would be XML Namespaces. In that case,
unless there is a single rule for comparison that can be used
by all implementations, it would lead to some XML being
declared namespace valid by one XML processor, and not
namespace valid by some other XML processor. That was
clearly not acceptable.

>I am disappointed that a MUST requirement was added to IRI in the
>last draft without working group review.  This part
>
>   Applications using IRIs as identity tokens with no relationship to a
>   protocol MUST use the Simple String Comparison (see section 5.3.1).
>   All other applications MUST select one of the comparison practices
>   from the Comparison Ladder (see section 5.3 or, after IRI-to-URI
>   conversion, select one of the comparison practices from the URI
>   comparison ladder in [RFC3986], section 6.2)
>
>is completely missing the point of the ladder.

It was added due to a security review initiated by the IESG.
I think I wrote the actual text, but the suggestion for
the MUSTs didn't come from the authors, but from the security
expert.

>The identifiers may
>or may not be equivalent and there is absolutely no reason for
>protocols to require inaccurate comparisons.  The reason for
>simplification of comparison is ONLY that false negatives are
>an acceptable fact of life and their elimination is an
>implementation-specific decision that has no impact on
>interoperable use of identifiers.  That is why there is no such
>requirement for URIs.

The difference to the URI spec is not a good thing. But I disagree
with your explanation. Both XML Namespaces and RDF (which is
based on XML Namespaces) *require* character-by-character
comparison, for good reasons. Otherwise, concepts such as
conformance to XML Namespaces or equivalence of RDF statements
would just hang in the air. Security protocols are of course
another area of application where consistent behavior is
extremely important. "Sometimes it may match, sometimes not"
or "if you know all the schemes and protocols involved, the
server used on the other side, and the intent of the creator/
maintainer of the resource for what to do about it in the future,
you'll get consistent results" are just sometimes not good enough.


Regards, Martin.