[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: restrictions when defining charsets
> >"charset" should provide all the profiling information to uniquely
> >map a byte stream to glyphs.
> >
> >Thus, bare Unicode, which can't map some Devanagari and some Han correctly,
> >can't be a "charset".
>
> Actually, this is true of a wide variety of things that you might think
> are charsets.
>
> One of them is ASCII!
I know there once was overriding of functionality in character encoding.
So, it might be necessary to define an instance of ASCII on the internet,
if there is any confusion.
> The result is a standard which is vague enough
> about the appearance of the glyphs that it is legitimate to print code 41
> (exclamation point) as or-bar and code 94 (circumflex) as not-sign.
I'm afraid that RFC1345 defines ASCII code point 41 (octal) as an
exclamation mark and 94 (decimal) circumflex accent.
> It is madness to interpret the definition of "charset" so narrowly that
> the well-understood ASCII character set would not qualify.
It is well-understood, though it might have been misunderstanding, by most
of the people that code point 41 of ASCII is an exclamation mark and
definitely not or-bar.
> I'm not up on the subtleties of the Devanagari/Han
> problems with Unicode, but I strongly suspect that they qualify as bugs,
> which we can legitimately overlook, rather than gross differences of
> intent, which we can't.
Why, do you think, Japan vote NO to DIS 10646-1.2?
The "BUG" was pointed out repeatedly and, still, was not fixed because
the committee did not considered it bug.
To the committee, it is a specificaion.
> Unicode is *meant* to be a unique mapping, and
> comes very close to being one.
I'm afraid your statement is based on observations of European characters
only.
I must repeat my warning. Don't be confused by the fact that Unicode
comes very close to be a unique mapping between code points and
European characters.
If you can't understand Devanagari/Han problem, see documents of Unicode.
As I quoted in earlier posting, it is explicitely stated that it is not
meant to be so.
Masataka Ohta