[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [IETF-PKIX] Subject/Issuer Name Population



At 09:20 AM 4/23/98 -0400, John_Wray@iris.com wrote:
>i)  This may invalidate certificates that are currently deployed.
>Since printableString is such a limited character set (a..z,
>A..Z, 0..9 plus a few punctuation characters, not including
>"@", "!" or "_"), many deployed certificates will currently
>use either teletexString or universalString, but since X.520
>matching rules are supposed to be in effect, there is no
>reason to suppose that equivalent names in existing
>certificates will match under a binary comparison.

I haven't looked at the X.520 matching rules, but I wonder how they could
cover universalString. The character set for universalString is ISO 10646
in UCS-4 mode (4 bytes per character). I don't believe that ISO 10646
defines case-matching rules, nor does it have a list of spacing characters,
so doing case-folding and space stripping would have to use local rules,
meaning lack of interoperability. Note that case mapping and spaces are
defined in Unicode, but only in non-normative tables, so we can't even look
there for guidance.

>ii)  The preferred ordering of printableString, bmpString and
>utf8String is problematic.  Since almost anything that might
>ever appear in a name attribute is going to reside within the
>BMP character-set, utf8String will hardly ever be used, so this
>really comes down to a choice between printableString and
>bmpString; utf8String will never actually be used if this
>ordering is used.  This is a pity, since utf8String is
>actually a very implementation-friendly codeset, in that
>it's identical to 7-bit ASCII for characters that can be
>represented in 7-bit ASCII, it doesn't use embedded null
>octets, so code can use regular null-terminated C-strings to
>store and manipulate it, it is designed so that it won't
>break 8-bit clean code that scans for 7-bit ASCII values,
>and it's efficient in terms of the number of octets used
>to encode a given string.

I agree fully, except for that last part. UTF8String is not efficient for
all characters; in specific, it takes 3 bytes to encode all Indic and Asian
scripts, which can be encoded in 2 bytes in BMPString. On the whole,
however, I also support UTF8String over BMPString for the other factors you
list.


--Paul Hoffman, Director
--Internet Mail Consortium