[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: restrictions when defining charsets



Dave:

> Certainly, additional character sets can and should be registered with
> IANA for use with MIME. But this is not a matter that needs discussion
> among the Mime list.  IANA registration is a simple procedure.

As long as we are registering a character set, the resitration should
be simple.

The problem here is that "what is a character set and what is not?".

>     	charset=iso-10646-sanskrit-japanese-utf2
> 
> Well, perhaps IANA registration WON'T be simple, since you are raising some
> issues to debate.  But again, I don't think that the topic is of concern
> to the MIME list, though it certainly is important to make the charset
> name be appropriate.

As "charset" is an idea in MIME, its meaning should be precisely
defined by MIME group, I think.

According to the following definition:

    The Working Group specified the definition of a character set
    for the purposes of quad-x to be a unique mapping of a byte
    stream to glyphs, a mapping which does not require external
    profiling information.

"charset" should provide all the profiling information to uniquely
map a byte stream to glyphs.

Thus, bare Unicode, which can't map some Devanagari and some Han correctly,
can't be a "charset".

The term "correctly" here means that native users of languages covered
by the "charset" won't find any difficulty in reading the resulting glyph
representaiton.

From: ayers@mv.us.adobe.com

>    But, assuming that the only language dependence of Unicode 
>    is to Devanagari and to Han, we might be able to register ...
>
>I recall a poster noting the language dependence of accented vowels,
>e.g. between English and German ...

Some said that English diaeresis and German umlaut should be
distinguished because they have different MEANING.

But, if the requirement is "a unique mapping of a byte stream to glyphs"
and if diaeresis and umlaut share the exactly same glyph (I don't have
enough cultural background to judge that), it is not necessary for a
"charset" to distinguish them.

						Masataka Ohta