[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: restrictions when defining charsets



Steve Summit writes:

> For two immediate examples, I would still think that "the
> interpretation of each octet cannot be questioned" was intended
> to rule out things like ISO-2022-JP, except for having read
> suggestions on this list that it's really supposed to avoid
> things like undifferentiated ISO-646.  I still have no idea what
> "the number of representable characters is limited" is supposed
> to accomplish.  (It probably rules out mnemonic encoding schemes,
> but for what seem to me to be the wrong reasons.)

To rule out iso-2022-jp would indeed be a most unfortunate move,
as this is the predominant encoding in Japan for email.

I agree national ISO646 versions should be properly labelled,
(as also the different parts of 8859 should be labelled)

mnemonic encodings  as per RFC1345 have a limited set of characters
- some 24.000 but they are limited!

"the number of representable characters is limited" I believe
is to say that the combining sequences of Unicode cannot be used
to generate characters.
Anyway this is subtle wrt ISO 10646, as the repertoire of 10646 is
fixed: you cannot use combining characters to generate new characters.
You can use them to generate combining sequences, but that is not
characters! A matter of definition.

I believe the sentence on "not a set of characters" alludes
to the ISO terminology, where a "character set" is a repertoire
(without applicable encoding). This may be explained
further for people not aquainted to ISO character terminology.

Keld