[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: charset use of UTF-16 vs. utf-8 - Canonical XML



On Thursday, June 6, 2002, 8:07:52 AM, Larry wrote:


LM> Consider that "Canonical XML" (RFC 3076)
LM> http://www.w3.org/TR/2001/REC-xml-c14n-20010315

LM> Do you think it legitimate that protocols might require
LM> that the XML entity already be in canonical form?

The primary value of Canonical XML is to determine whether two
documents/messages/whatever in XML are identical except for
variations (use of entities, attribute order, encoding, placement of
namespace declarations, choice of namespace prefixed) allowed by XML
and Namespaces in XML.

A secondary use is to sign the canonicalized form, to detect changes
made that are not in the set of allowed variations.

If a protocol were comparing messages, canonicalization would be the
way to go. Although, the canonicalization should be done as close as
possible to the point of comparison otherwise subsequent stages might
undo some or all of the canonicalization.

Requiring messages to be pre-canonicalized would imply an additional
requirement on all subsequent stages of processing to both check
canonicalization and to emit only the canonicalized form (and
presumably to burt into flames if the content is not canonicalized).

LM> It would restrict them not only to be in UTF-8
LM> but also various other restrictions
LM> (e.g., the XML declaration is removed).

LM> Is it reasonable to allow a protocol to require
LM> a subset of the canonical restrictions without
LM> requiring all of them?

Such as, just the UTF-8 part?


-- 
 Chris                            mailto:chris@xxxxxx