From: Jean-Marc Desperrier (jean-marc.desperrier@certplus.com)
Date: Wed Dec 11 2002 - 07:09:37 CST
Charles Lindsey a écrit:
> The syntax for UTF8-xtra-char excludes those redundant sequences of
> octets which cannot occur in UTF-8, as defined by [RFC 2279], either
> because they would not be the shortest possible encodings of some UCS
> character [ISO/IEC 10646], or they would represent one of the
> characters D800 through DFFF, disallowed in UCS because of their
> surrogate use in the UTF-16 encoding. These sequences MUST NOT be
> generated by posting agents. Where they occur inadvertently, they
> SHOULD be passed on untouched by other agents, but attempts to
> interpret them as malformed UTF-8 MUST NOT be made. However, if there
> is reason to suppose they are representations of some other character
> set they MAY, as suggested in section 4.4.1, be interpreted as such.
> The syntax also includes, for completeness, the cases UTF8-5 and
> UTF8-6 which cannot, in fact, arise in [UNICODE 3.2] (though they
> might conceivably arise in some future extension).
>
> Is that acceptable to everybody?
It's OK for me.