[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about character sets and their encodings



Mark,

You are right, I did a mistake taking UniversalString for GeneralString
(which allows all 2022 coding and announcements).

Obviously UniversalString is rserved for BMP of ISO 10 646, canonical form
- 4 bytes per charcater. What is not clear here is the 4 bytes, which are
used when you specify the group, the plane and the row and cell
of every charcater. If it is just BMP ISO 10 646 1.1 then you do not need
the first two bytes. Using them just take more space and bandwith.
It will be intersting to see who else has done the same implementation
as you. However. it is true that very soon we will have another plane
which will code all ideographic scripts. That was solution recently
endorsed by WG 2 of ISO JTC1 SC2.

Regards,

Borka

> Borka, 
> 
> It sounds to me like you are refering (in the quoted text at the end
> of this message) to the unrestricted string type, rather than the
> UniversalString type.  X.690 describes a rather complicated structure
> (in 8.21.6) for that type that includes the object identifiers and
> such.  But X.690 says:
> 
>  8.20.5 For restricted character string apart from UniversalString and
>  BMPString the octet string shall contain the octets specified in ISO
>  2022 for encodings in an 8-bit environment, using the escape sequence
>  and character codings registered in accordance with ISO 2375.
> 
> This would seem to imply that UniversalString and BMPString don't use
> escape sequences, since escape sequences aren't mentioned anywhere
> else in reference to UniversalString.  Which would make sense, because
> it seems to me that the whole point of UniversalString is to be a
> "flat" character set, where you don't have to worry about escape
> sequences and modes.  This seems to be confirmed by the fact that
> UniversalString doesn't appear in "Table 3 - Use of escape sequences",
> and also by
> 
>  8.20.7 For the "UniversalString" type, the octet string shall contain
>  the octets specified in ISO/IEC 10646-1, using the 4-octet canonical
>  form (see 14.2 of ISO/IEC 10646-1).  Control functions and signatures
>  shall not be used.
> 
> Since all values in the character set can be reproduced in 4 octets,
> there is no reason for escape codes.
> 
> I think my current encoding is correct (nobody said my example was
> wrong) but I'd like to see other examples of UniversalString encodings
> (not necessarily even in certificates!), both because I'm paranoid :)
> and because nobody has indicated that they believe my encoding to be
> correct.  Knowing this list, I wouldn't be surprised if it was just
> that the few people who actually know (at this point in time it's kind
> of an obscure issue) just were too busy to respond.  But up until now,
> I haven't been able to find a single example of a UniversalString DER
> or BER encoding, in or out of a certificate.  Arrrggghhh!  Isn't
> *anybody* using UniversalString?  Is anybody reading this thread?
> 
> - Mark Bartel
>