[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: restrictions when defining charsets
Steve Summit writes:
> For two immediate examples, I would still think that "the
> interpretation of each octet cannot be questioned" was intended
> to rule out things like ISO-2022-JP, except for having read
> suggestions on this list that it's really supposed to avoid
> things like undifferentiated ISO-646. I still have no idea what
> "the number of representable characters is limited" is supposed
> to accomplish. (It probably rules out mnemonic encoding schemes,
> but for what seem to me to be the wrong reasons.)
To rule out iso-2022-jp would indeed be a most unfortunate move,
as this is the predominant encoding in Japan for email.
I agree national ISO646 versions should be properly labelled,
(as also the different parts of 8859 should be labelled)
mnemonic encodings as per RFC1345 have a limited set of characters
- some 24.000 but they are limited!
"the number of representable characters is limited" I believe
is to say that the combining sequences of Unicode cannot be used
to generate characters.
Anyway this is subtle wrt ISO 10646, as the repertoire of 10646 is
fixed: you cannot use combining characters to generate new characters.
You can use them to generate combining sequences, but that is not
characters! A matter of definition.
I believe the sentence on "not a set of characters" alludes
to the ISO terminology, where a "character set" is a repertoire
(without applicable encoding). This may be explained
further for people not aquainted to ISO character terminology.
Keld