[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Questions about character sets and their encodings
Hoyt,
In the version of X.520 that I have, it says (at the start of section 2, just before 5.1)
DirectoryString { INTEGER : maxSize } ::= CHOICE {
teletexString TeletexString (SIZE (1..maxSize)),
printableString PrintableString (SIZE (1..maxSize)),
universalString UniversalString (SIZE (1..maxSize)) }
This is 1993 version however, has it been revised to include BMPString? I'd be quite happy to use BMPString if I could. As far as I've heard, BMP and Unicode are identical, and my products use Unicode internally.
And I'm aware of the ITU site, and have gotten several recommendations from there. However, X.680 and X.681 are not among the recommendations that are available electronically. I really hate the idea of ordering a paper copy, and of having to use snail mail, although I suspect I may break down soon.
I don't think I really need ISO/IEC 10646-1, because all I need to know is what "4-octet canonical form" is (or if I use BMPString, what "2-octet BMP form" is). I can't imagine that those forms are anything other than the obvious (four octets, big endian format, or in the case of BMPString, two octets, big endian) but nobody seems to be able to confirm this.
- Mark Bartel
>-----Original Message-----
>From: KESTERSON.H [SMTP:H.Kesterson@az05.bull.com]
>Sent: Monday, January 20, 1997 5:18 PM
>To: Borka Jerman-Blazic; chandras@loc201.tandem.com
>Cc: ietf-pkix@tandem.com; wg-i18n@terena.nl
>Subject: Re: Questions about character sets and their encodings
>
>asn1. also defines the type BMPString which is the Basic Multilingual Plane
>character set. This is the set of 16 bit character encodings that is
>"supposedly" equivalent to the unicode set (the intent was to make unicode and
>BMP identical - i believe they succeeded but i never verified it by checking the
>final version of the standard). the advantage of this asn.1 type is that it uses
>the asn.1 tage to indicate the selection from the rather large UniversalString
>set of character encodings. x.500 permits the use of BMPString in a
>distinguished name. therefore, the standard permits this characterset in a
>certificate, subject to profiles of course
>although 646 provides greater interoperability, one should consider national language
>and localization requirements for your product. the BMP set (Unicode) from 10646
>is the best character set to address such requirements.
>the tag value for UniversalString is 28; BMPString is 30. The asn.1 standard does
>not recommend the use of UniversalString without constraints.
>you can order and retrieve itu recommendations electronically. see http:// www.itu.ch/POD/itut-123.html.
>this can help you get the asn.1 documents. iso does not offer such a service; however,
>i suspect that you would want a paper copy of 10646 since display of the glyphs
>from an electronic text would be a challenge for the font capability of most systems.
>hoyt
>_______________________________________________________________________________
>Subject: Questions about character sets and their encodings
>From: Borka Jerman-Blazic <borka@e5.ijs.si> at az05-smtp
>Date: 1/20/97 11:32
>
>
>
>>I've been searching for resources on the web on character sets for
>>ASN.1 types, but haven't been able to find what I need. Does anybody
>>know of such resources?
>
>>More specifically, my questions are:
>
>> IA5String: I've heard this is just 7-bit ASCII, but is that
>>true?
>
>
>Yes, IA5String is the same as 7-bit ASCII or ISO IRV 646 with one exception (the
>old one) instead of dollar sign it could be currency sign.
>> TeletexString: I understand that one can select different
>>character sets with escape codes; does anybody have the set of escape
>>codes and character sets they select?
>
>
>TeleString is mainly T.61 or T.50 (one of them is currently used in Teletex) or
>its international ISO version ISO 6937. This is 8-bit coding with changable use
>of a code lenght per character e.g. the accented letter of Latin alphabet are coded
>with 2 bytes, one for the so called "non-spacing" character (the acent or diacritic
>sign) and the other for the basic letter.
>> UniversalString: X.690 says that this is encoded in 4-octet
>>canonical form, which I would imagine is 4 bytes, big-endian. Also, I
>>believe that the subset of the 32-bit space with the high 16 bits set
>>to 0 is equivalent to Unicode. Am I correct in these assumptions?
>>X.690 says a bunch of things that I don't really understand (probably
>>because I don't have X.680) about character string types and their
>>encodings. Also, what is the tag for UniversalString? Does anybody
>>have examples of certificates with UniversalString values in the
>>Issuer or Subject? I've heard that Microsoft certificates do, but
>>when I queried Verisign for Microsoft certificates they didn't. I
>>couldn't get ahold of any Microsoft products that actually issue
>>certificates to check that possibility.
>
>UniversalString allows all methods of coding e.g. with the ISO 2022 standard which
>is basic standard for code extension techniques in 7-bit and 8-bit environment you
>can use any of the registred code tables in the International Register of Coded
>Character sets. The invocation is done with shift functions and there are 4 of them:
>Single shift and locked single shift for one characater from the code table and
>2 others for invocation of the whole coded charcater set table. Every code table
>has his registered escape sequence for invocation, thus ISO 10 646 which first Multilingual
>plane is equal to Unicode has its one escape sequence and thus can be invoced in
>the UniversalString.
>The other way of use of different coded character sets in UniversalString is with
>an announcement mechanisms which are specified in ASN.1 or ISO 8825 and ISO 8824
>standard. Every used coded charcater set has his OID.
>
>>I know I could probably get the answers by ordering ISO/IEC 10646-1
>>and X.680, but from what I can see I'd have to do it snail mail. I'd
>>rather not.
>
>My recommendation is to use ISO 10 646 as this is the only remedy for the mess in
>the charcater set world.
>
>I am very interested in the solutions used by Microsoft. The new version of the
>X.500 standard allows used of "context" for different attributes which means specification
>of the used coded charcater set. Without proper spelling of people names and addresses
>there will be no value of the certificates as the names will differ from the names
>on the documents e.g. on the credit cards.
>
>Regards,
>
>
>Borka Jerman-blazic
>Chair, WG-I18N@TERENA.NL
>
>TERENA is European Association of the Research and Academic Networks
>
>
>
>
>