[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: URI canonicalization

On Jan 31, 2005, at 11:56 PM, Martin Duerst wrote:
At 14:27 05/02/01, Roy T. Fielding wrote:
>That would be a falsehood.  Identifiers are not subject to
>"simplification" -- they are either equivalent or not.  We can
>add all of the implementation requirements we like to prevent
>software from detecting false negatives, but that doesn't change
>the fact that equivalent identifiers always identify the same
>resource.  It is the author's responsibility to use URIs
>(or IRIs) that are actually different, not the responsibility
>of the protocol or implementation.

It's okay for some network-oriented usage to work that way.
The network can fail any moment, anyway. But it's not helpful
at all in some contexts that are not network-oriented. A very
good case in point would be XML Namespaces. In that case,
unless there is a single rule for comparison that can be used
by all implementations, it would lead to some XML being
declared namespace valid by one XML processor, and not
namespace valid by some other XML processor.

Where is this myth coming from? If a single document uses two different URIs to name the same namespace (or multiple documents using different URIs are merged) then an XML processor will consider those qualified names as different names. That does not change their validity whatsoever. It doesn't even change their data model, since to be equivalent the two URIs must refer to the same namespace and therefore the same data model. Validity is a property of DTDs and Schema, not XML namespace processing, and neither DTDs nor Schema are able to redefine the meaning of URIs. The only purpose of the comparison algorithm is to allow those technologies to take the shortcut of evaluating structures based solely on the strcmp URI comparison, leaving false-negatives to be resolved by the author.

>I am disappointed that a MUST requirement was added to IRI in the
>last draft without working group review. This part
> Applications using IRIs as identity tokens with no relationship to a
> protocol MUST use the Simple String Comparison (see section 5.3.1).
> All other applications MUST select one of the comparison practices
> from the Comparison Ladder (see section 5.3 or, after IRI-to-URI
> conversion, select one of the comparison practices from the URI
> comparison ladder in [RFC3986], section 6.2)
>is completely missing the point of the ladder.

It was added due to a security review initiated by the IESG.
I think I wrote the actual text, but the suggestion for
the MUSTs didn't come from the authors, but from the security

*grumble* If I had a dime for every time an IETF "security expert" screwed up an application protocol, I'd at least have a grande caramel machiatto to numb this headache. It is never a good idea to add requirements that make additional security-checking a non-compliant application.

The difference to the URI spec is not a good thing. But I disagree
with your explanation. Both XML Namespaces and RDF (which is
based on XML Namespaces) *require* character-by-character
comparison, for good reasons.

No, they require strcmp comparison for certain procedures related to matching names. They cannot require implementations to ignore the equivalence of some URIs because that would change the meaning of the statements being made, particularly for RDF.

Otherwise, concepts such as
conformance to XML Namespaces or equivalence of RDF statements
would just hang in the air.

No, they don't -- the processing algorithm does not change the meaning of the identifiers. It only defines minimal conformance criteria for implementations. RDF statements do not know the meaning of the URIs used to create those statements, nor do they know what URIs are equivalent, but not knowing that two URIs are equivalent does not change the fact that any statements made about the resource of one URI must also be valid for the resource of the other equivalent URI, since URIs cannot be equivalent if they do not both identify the same resource. In other words, there are no closed-world theories of equivalence that override the universality of URIs. The same applies to IRIs.

Security protocols are of course
another area of application where consistent behavior is
extremely important. "Sometimes it may match, sometimes not"
or "if you know all the schemes and protocols involved, the
server used on the other side, and the intent of the creator/
maintainer of the resource for what to do about it in the future,
you'll get consistent results" are just sometimes not good enough.

Security protocols make comparisons based on what is being secured, not based on some abstract theory. They are capable of defining that for themselves, consistently, and with respect to the resources being secured rather than one identifier used to access those resources. The IRI spec doesn't know enough about an application's needs to declare one form of comparison to be better than others.