[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about character sets and their encodings



asn1. also defines the type BMPString which is the Basic Multilingual Plane
character set. This is the set of 16 bit character encodings that is
"supposedly" equivalent to the unicode set (the intent was to make unicode and
BMP identical - i believe they succeeded but i never verified it by checking the
final version of the standard). the advantage of this asn.1 type is that it uses
the asn.1 tage to indicate the selection from the rather large UniversalString
set of character encodings. x.500 permits the use of BMPString in a
distinguished name. therefore, the standard permits this characterset in a
certificate, subject to profiles of course

although 646 provides greater interoperability, one should consider national
language and localization requirements for your product. the BMP set (Unicode)
from 10646 is the best character set to address such requirements.

the tag value for UniversalString is 28; BMPString is 30. The asn.1 standard
does not recommend the use of UniversalString without constraints.

you can order and retrieve itu recommendations electronically. see http://
www.itu.ch/POD/itut-123.html. this can help you get the asn.1 documents. iso
does not offer such a service; however, i suspect that you would want a paper
copy of 10646 since display of the glyphs from an electronic text would be a
challenge for the font capability of most systems.

   hoyt
_______________________________________________________________________________
Subject: Questions about character sets and their encodings
From:    Borka Jerman-Blazic <borka@e5.ijs.si> at az05-smtp
Date:    1/20/97  11:32



>I've been searching for resources on the web on character sets for
>ASN.1 types, but haven't been able to find what I need.  Does anybody
>know of such resources?

>More specifically, my questions are:

>        IA5String: I've heard this is just 7-bit ASCII, but is that
>true?


Yes, IA5String is the same as 7-bit ASCII or ISO IRV 646 with one
exception (the old one) instead of dollar sign it could be
currency sign.
 
>        TeletexString: I understand that one can select different
>character sets with escape codes; does anybody have the set of escape
>codes and character sets they select?


TeleString is mainly T.61 or T.50 (one of them is currently
used in Teletex) or its international ISO version ISO 6937.
This is 8-bit coding with changable use of a code lenght per
character e.g. the accented letter of Latin alphabet are coded
with 2 bytes, one for the so called "non-spacing" character (the
acent or diacritic sign) and the other for the basic letter.

>        UniversalString: X.690 says that this is encoded in 4-octet
>canonical form, which I would imagine is 4 bytes, big-endian.  Also, I
>believe that the subset of the 32-bit space with the high 16 bits set
>to 0 is equivalent to Unicode.  Am I correct in these assumptions?
>X.690 says a bunch of things that I don't really understand (probably
>because I don't have X.680) about character string types and their
>encodings.  Also, what is the tag for UniversalString?  Does anybody
>have examples of certificates with UniversalString values in the
>Issuer or Subject?  I've heard that Microsoft certificates do, but
>when I queried Verisign for Microsoft certificates they didn't.  I
>couldn't get ahold of any Microsoft products that actually issue
>certificates to check that possibility.

UniversalString allows all methods of coding e.g. with the ISO 2022
standard which is basic standard for code extension techniques in
7-bit and 8-bit environment you can use any of the registred code tables
in the International Register of Coded Character sets. The invocation
is done with shift functions and there are 4 of them: Single shift and
locked single shift for one characater from the code table and
2 others for invocation of the whole coded charcater set table. Every code
table has his registered escape sequence for invocation, thus ISO
10 646 which first Multilingual plane is equal to Unicode has its
one escape sequence and thus can be invoced in the UniversalString.

The other way of use
of different coded character sets  in UniversalString is 
with an  announcement mechanisms which
are specified in ASN.1 or ISO 8825 and ISO 8824 standard. Every used
coded charcater set has his OID. 


>I know I could probably get the answers by ordering ISO/IEC 10646-1
>and X.680, but from what I can see I'd have to do it snail mail.  I'd
>rather not.

My recommendation is to use ISO 10 646 as this is the only remedy for the
mess in the charcater set world.


I am very interested in the solutions used by Microsoft. The
new version of the X.500 standard allows used of "context" for different
attributes which means specification of the used coded charcater set.
Without proper spelling of people names and addresses there will be no
value of the certificates as the names will differ from the
names on the documents e.g. on the credit cards.


Regards,


Borka Jerman-blazic
Chair, WG-I18N@TERENA.NL

TERENA is European Association of the Research and Academic Networks