[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Questions about character sets and their encodings
>I've been searching for resources on the web on character sets for
>ASN.1 types, but haven't been able to find what I need. Does anybody
>know of such resources?
>More specifically, my questions are:
> IA5String: I've heard this is just 7-bit ASCII, but is that
>true?
Yes, IA5String is the same as 7-bit ASCII or ISO IRV 646 with one
exception (the old one) instead of dollar sign it could be
currency sign.
> TeletexString: I understand that one can select different
>character sets with escape codes; does anybody have the set of escape
>codes and character sets they select?
TeleString is mainly T.61 or T.50 (one of them is currently
used in Teletex) or its international ISO version ISO 6937.
This is 8-bit coding with changable use of a code lenght per
character e.g. the accented letter of Latin alphabet are coded
with 2 bytes, one for the so called "non-spacing" character (the
acent or diacritic sign) and the other for the basic letter.
> UniversalString: X.690 says that this is encoded in 4-octet
>canonical form, which I would imagine is 4 bytes, big-endian. Also, I
>believe that the subset of the 32-bit space with the high 16 bits set
>to 0 is equivalent to Unicode. Am I correct in these assumptions?
>X.690 says a bunch of things that I don't really understand (probably
>because I don't have X.680) about character string types and their
>encodings. Also, what is the tag for UniversalString? Does anybody
>have examples of certificates with UniversalString values in the
>Issuer or Subject? I've heard that Microsoft certificates do, but
>when I queried Verisign for Microsoft certificates they didn't. I
>couldn't get ahold of any Microsoft products that actually issue
>certificates to check that possibility.
UniversalString allows all methods of coding e.g. with the ISO 2022
standard which is basic standard for code extension techniques in
7-bit and 8-bit environment you can use any of the registred code tables
in the International Register of Coded Character sets. The invocation
is done with shift functions and there are 4 of them: Single shift and
locked single shift for one characater from the code table and
2 others for invocation of the whole coded charcater set table. Every code
table has his registered escape sequence for invocation, thus ISO
10 646 which first Multilingual plane is equal to Unicode has its
one escape sequence and thus can be invoced in the UniversalString.
The other way of use
of different coded character sets in UniversalString is
with an announcement mechanisms which
are specified in ASN.1 or ISO 8825 and ISO 8824 standard. Every used
coded charcater set has his OID.
>I know I could probably get the answers by ordering ISO/IEC 10646-1
>and X.680, but from what I can see I'd have to do it snail mail. I'd
>rather not.
My recommendation is to use ISO 10 646 as this is the only remedy for the
mess in the charcater set world.
I am very interested in the solutions used by Microsoft. The
new version of the X.500 standard allows used of "context" for different
attributes which means specification of the used coded charcater set.
Without proper spelling of people names and addresses there will be no
value of the certificates as the names will differ from the
names on the documents e.g. on the credit cards.
Regards,
Borka Jerman-blazic
Chair, WG-I18N@TERENA.NL
TERENA is European Association of the Research and Academic Networks