[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: URI canonicalization
On Jan 31, 2005, at 11:56 PM, Martin Duerst wrote:
At 14:27 05/02/01, Roy T. Fielding wrote:
>That would be a falsehood. Identifiers are not subject to
>"simplification" -- they are either equivalent or not. We can
>add all of the implementation requirements we like to prevent
>software from detecting false negatives, but that doesn't change
>the fact that equivalent identifiers always identify the same
>resource. It is the author's responsibility to use URIs
>(or IRIs) that are actually different, not the responsibility
>of the protocol or implementation.
It's okay for some network-oriented usage to work that way.
The network can fail any moment, anyway. But it's not helpful
at all in some contexts that are not network-oriented. A very
good case in point would be XML Namespaces. In that case,
unless there is a single rule for comparison that can be used
by all implementations, it would lead to some XML being
declared namespace valid by one XML processor, and not
namespace valid by some other XML processor.
Where is this myth coming from? If a single document uses two
different URIs to name the same namespace (or multiple documents
using different URIs are merged) then an XML processor will consider
those qualified names as different names. That does not change their
validity whatsoever. It doesn't even change their data model, since
to be equivalent the two URIs must refer to the same namespace and
therefore the same data model. Validity is a property of DTDs and
Schema, not XML namespace processing, and neither DTDs nor Schema
are able to redefine the meaning of URIs. The only purpose of the
comparison algorithm is to allow those technologies to take the
shortcut of evaluating structures based solely on the strcmp URI
comparison, leaving false-negatives to be resolved by the author.
>I am disappointed that a MUST requirement was added to IRI in the
>last draft without working group review. This part
> Applications using IRIs as identity tokens with no relationship to
> protocol MUST use the Simple String Comparison (see section 5.3.1).
> All other applications MUST select one of the comparison practices
> from the Comparison Ladder (see section 5.3 or, after IRI-to-URI
> conversion, select one of the comparison practices from the URI
> comparison ladder in [RFC3986], section 6.2)
>is completely missing the point of the ladder.
It was added due to a security review initiated by the IESG.
I think I wrote the actual text, but the suggestion for
the MUSTs didn't come from the authors, but from the security
*grumble* If I had a dime for every time an IETF "security expert"
screwed up an application protocol, I'd at least have a grande
caramel machiatto to numb this headache. It is never a good idea
to add requirements that make additional security-checking
a non-compliant application.
The difference to the URI spec is not a good thing. But I disagree
with your explanation. Both XML Namespaces and RDF (which is
based on XML Namespaces) *require* character-by-character
comparison, for good reasons.
No, they require strcmp comparison for certain procedures related
to matching names. They cannot require implementations to ignore
the equivalence of some URIs because that would change the meaning
of the statements being made, particularly for RDF.
Otherwise, concepts such as
conformance to XML Namespaces or equivalence of RDF statements
would just hang in the air.
No, they don't -- the processing algorithm does not change the
meaning of the identifiers. It only defines minimal conformance
criteria for implementations. RDF statements do not know the
meaning of the URIs used to create those statements, nor do they
know what URIs are equivalent, but not knowing that two URIs are
equivalent does not change the fact that any statements made about
the resource of one URI must also be valid for the resource of the
other equivalent URI, since URIs cannot be equivalent if they
do not both identify the same resource. In other words, there
are no closed-world theories of equivalence that override the
universality of URIs. The same applies to IRIs.
Security protocols are of course
another area of application where consistent behavior is
extremely important. "Sometimes it may match, sometimes not"
or "if you know all the schemes and protocols involved, the
server used on the other side, and the intent of the creator/
maintainer of the resource for what to do about it in the future,
you'll get consistent results" are just sometimes not good enough.
Security protocols make comparisons based on what is being secured,
not based on some abstract theory. They are capable of defining
that for themselves, consistently, and with respect to the resources
being secured rather than one identifier used to access those
resources. The IRI spec doesn't know enough about an application's
needs to declare one form of comparison to be better than others.