[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: restrictions when defining charsets
Dave:
> Certainly, additional character sets can and should be registered with
> IANA for use with MIME. But this is not a matter that needs discussion
> among the Mime list. IANA registration is a simple procedure.
As long as we are registering a character set, the resitration should
be simple.
The problem here is that "what is a character set and what is not?".
> charset=iso-10646-sanskrit-japanese-utf2
>
> Well, perhaps IANA registration WON'T be simple, since you are raising some
> issues to debate. But again, I don't think that the topic is of concern
> to the MIME list, though it certainly is important to make the charset
> name be appropriate.
As "charset" is an idea in MIME, its meaning should be precisely
defined by MIME group, I think.
According to the following definition:
The Working Group specified the definition of a character set
for the purposes of quad-x to be a unique mapping of a byte
stream to glyphs, a mapping which does not require external
profiling information.
"charset" should provide all the profiling information to uniquely
map a byte stream to glyphs.
Thus, bare Unicode, which can't map some Devanagari and some Han correctly,
can't be a "charset".
The term "correctly" here means that native users of languages covered
by the "charset" won't find any difficulty in reading the resulting glyph
representaiton.
From: ayers@mv.us.adobe.com
> But, assuming that the only language dependence of Unicode
> is to Devanagari and to Han, we might be able to register ...
>
>I recall a poster noting the language dependence of accented vowels,
>e.g. between English and German ...
Some said that English diaeresis and German umlaut should be
distinguished because they have different MEANING.
But, if the requirement is "a unique mapping of a byte stream to glyphs"
and if diaeresis and umlaut share the exactly same glyph (I don't have
enough cultural background to judge that), it is not necessary for a
"charset" to distinguish them.
Masataka Ohta