[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 16/32-bit charsets and MIME-encoding
>People probably have differing opinions about this. Combining
>character purists would probably want even the Latin-1 precomposed
>characters to be represented as 2 characters each (i.e. base and
#That is OK if it was a precomposed character, BUT the "a with ring above",
#"a with diaeresis" and "o with diaeresis" are in Swedish NOT (observe NOT)
#a composed character! They may NOT be represented as two characters.
I don't know any "combining character purists" who want things to be,
as much as possible, represented as minimal characters with additional
material added on. Lots of pre-combined character purists--most of the
"I really want every character represented by the same number of bits as
every other character such that I can estimate the number of print
positions in a representation from the octet length of a string adjusted
by a divisor" camp are effectively pre-combined character purists.
The people who would be combining character purists would mostly be
in the type foundry business, trying to minimize the number of discrete
things that have to be in their "how to draw it" tables or on their
typeballs, and that isn't communication, it is printing technology.
If you believe what I quote above, then your position moves very
close to Ohta-san's and, from the perspective of that position, 10646 is
a dismal failure. 10646's logic is that, if two things look the same on
the printed page, they are the same and that, moreover, one can be
pretty sloppy about what that printed page looks like. The distinction
between "a with diaeresis" as a Swedish character and "a with diaeresis"
as a character with a phoneme-changing diacritical (as in English) is
simply not made and, indeed, is fairly explicitly denied in the DIS (at
The claims about European bias in the present 10646 arise mostly
from the fact that it is fairly easy to profile our way out of your
problem: a rule that, if the letter is meant, the single-character form
must be used and, if an accented letter is meant, the base-character
plus combining character(s) form must be used, would do it.
Ohta-san's problem requires that he say "if you see character nnnn,
it is Chinese, and that means that, if you want to write the Japanese
character that would normally be represented by nnnn, you are out of
luck" (or need a state-changing escape ritual). That is a much more
unpleasant thing to have to say (hence the occasional claims of bias),
but really depends on the same sort of assertion you are making, which
is that character-symbols, representations, and codings really do have
semantics that go beyond the equivalency of things that look alike when
printed in appropriate (or appropriately-fuzzy) fonts.
Again, just trying to clarify our problem here.