[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Questions about character sets and their encodings
Borka,
It sounds to me like you are refering (in the quoted text at the end
of this message) to the unrestricted string type, rather than the
UniversalString type. X.690 describes a rather complicated structure
(in 8.21.6) for that type that includes the object identifiers and
such. But X.690 says:
8.20.5 For restricted character string apart from UniversalString and
BMPString the octet string shall contain the octets specified in ISO
2022 for encodings in an 8-bit environment, using the escape sequence
and character codings registered in accordance with ISO 2375.
This would seem to imply that UniversalString and BMPString don't use
escape sequences, since escape sequences aren't mentioned anywhere
else in reference to UniversalString. Which would make sense, because
it seems to me that the whole point of UniversalString is to be a
"flat" character set, where you don't have to worry about escape
sequences and modes. This seems to be confirmed by the fact that
UniversalString doesn't appear in "Table 3 - Use of escape sequences",
and also by
8.20.7 For the "UniversalString" type, the octet string shall contain
the octets specified in ISO/IEC 10646-1, using the 4-octet canonical
form (see 14.2 of ISO/IEC 10646-1). Control functions and signatures
shall not be used.
Since all values in the character set can be reproduced in 4 octets,
there is no reason for escape codes.
I think my current encoding is correct (nobody said my example was
wrong) but I'd like to see other examples of UniversalString encodings
(not necessarily even in certificates!), both because I'm paranoid :)
and because nobody has indicated that they believe my encoding to be
correct. Knowing this list, I wouldn't be surprised if it was just
that the few people who actually know (at this point in time it's kind
of an obscure issue) just were too busy to respond. But up until now,
I haven't been able to find a single example of a UniversalString DER
or BER encoding, in or out of a certificate. Arrrggghhh! Isn't
*anybody* using UniversalString? Is anybody reading this thread?
- Mark Bartel
> > UniversalString: X.690 says that this is encoded in 4-octet
> >canonical form, which I would imagine is 4 bytes, big-endian. Also, I
> >believe that the subset of the 32-bit space with the high 16 bits set
> >to 0 is equivalent to Unicode. Am I correct in these assumptions?
> >X.690 says a bunch of things that I don't really understand (probably
> >because I don't have X.680) about character string types and their
> >encodings. Also, what is the tag for UniversalString? Does anybody
> >have examples of certificates with UniversalString values in the
> >Issuer or Subject? I've heard that Microsoft certificates do, but
> >when I queried Verisign for Microsoft certificates they didn't. I
> >couldn't get ahold of any Microsoft products that actually issue
> >certificates to check that possibility.
>
> UniversalString allows all methods of coding e.g. with the ISO 2022
> standard which is basic standard for code extension techniques in
> 7-bit and 8-bit environment you can use any of the registred code tables
> in the International Register of Coded Character sets. The invocation
> is done with shift functions and there are 4 of them: Single shift and
> locked single shift for one characater from the code table and
> 2 others for invocation of the whole coded charcater set table. Every code
> table has his registered escape sequence for invocation, thus ISO
> 10 646 which first Multilingual plane is equal to Unicode has its
> one escape sequence and thus can be invoced in the UniversalString.
>
> The other way of use
> of different coded character sets in UniversalString is
> with an announcement mechanisms which
> are specified in ASN.1 or ISO 8825 and ISO 8824 standard. Every used
> coded charcater set has his OID.
>
>
> >I know I could probably get the answers by ordering ISO/IEC 10646-1
> >and X.680, but from what I can see I'd have to do it snail mail. I'd
> >rather not.
>
> My recommendation is to use ISO 10 646 as this is the only remedy for the
> mess in the charcater set world.
>
>
> I am very interested in the solutions used by Microsoft. The
> new version of the X.500 standard allows used of "context" for different
> attributes which means specification of the used coded charcater set.
> Without proper spelling of people names and addresses there will be no
> value of the certificates as the names will differ from the
> names on the documents e.g. on the credit cards.
>
>
> Regards,
>
>
> Borka Jerman-blazic
> Chair, WG-I18N@TERENA.NL
>
> TERENA is European Association of the Research and Academic Networks
>
>
>
>
>