[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 10646, and all that



> I believe that many of us recognize the problems you are pointing out
> and are trying to work with you in solving them.  At least some of us
> believe that the problems are complex and that there is not a single
> simple solution that meets all needs.

I don't think the problem is complex. My solution is simple enough,
isn't it?

I'm afraid that some of you are making the simple problem complex by
overgeneralization.

>    The cited definition is very old.

But, ISO 646 is disambiguated with it, even now.

> >> No strong preference.  What I was really arguing for was separating the
> >> language info from the char set name.
> >
> >Why do you think the separation is necessary?
> 
> Because I'm seeking a position that is technically reasonable, symmetric
> across languages, and that people can deal with.

What?

	symmmetric across languages?

what is it? Isn't it a totally now concept?

> And it is interesting that the ability to designate
> language even when it is not needed to clarify a character set may
> leverage other useful things.

That's overgeneralization.

> As I have said before, we would not need to do any of this if 10646 were
> really adequate to the role to which we would like to assign it.  It
> isn't.  If you don't like that, take it up with ISO.

It is the intended design goal of Unicode/ISO10646 to provide language
information outside of Unicode/ISO10646. So, why don't we do so?

> And, as we have 
> discussed in private, while I understand that Japan voted against 10646
> DIS-2 at the JTC1 level, I also understand that, had Japan felt very
> strongly about this and been able to find a single additional JTC1
> P-member to agree, 10646 could easily have been buried in ISO
> procedures, probably into the next century.    It is consequently
> rational to deduce the absence of a strong majority in the Japanese
> standards community that the unification issue is *that* important, all
> of the time.

You mean, there is no strong majority in Japan because Japan failed
to force another non-Japanese P-member to agree?

While your reasoning might be convincing in the political world of
ISO, it, at least, has nothing to do with the current issue on
profiling of 10646. So, could you say it outside of IETF?

> Conversely,
> there are clearly situations in which unified Han are interpretable from
> context, however un-aesthetic or un-linguistic that might be.

Just as unified variants of ISO 646 are interpretable from context,
unified Han are interpretable.

> The
> reality is that the issue isn't a binary construction like "need", but a
> scale from "harmless but probably not worth the trouble" to "required
> for proper interpretation by many users".

The issue is "correctness".

No Japanese script is written in Chinese Han. Thus, it is incorrect to
write Japanses script in Chinese Han.

> >In the case of ISO 646, we have assigned different charset names to
> >each national variant.
>     And deprecated their use.   But these are national variants
> recognized by ISO, and national variants in which the character
> descriptions and names drawn from the repertoire are different.

CJK variance of 10646 is also recognized by ISO. See DIS 10646-1.2
and you will find each variant of CJK Han characters are listed,
that's why 10646 became so voluminous.

ISO recognizes and documented that CJK characters are different.

>     As others have pointed out, one often benefits from language
> information even if there are no structural ambiguities about the
> character encoding.

That is an entirely separate issue.

> >Moreover, the truly multilingual character encoding won't need:
> >	Content-language:
> >header at all. So, I object to introduce the to-be-obsoleted header.

> Tuples as complex as {country,language, character
> set, character encoding} are common in linguistic and textual analysis
> work.

That's not common for plain text.

>     You have convinced me (not hard, I was convinced by mid-1991) and
> much of the rest of the WG that 10646 isn't a "truely multilingual
> character encoding".   But the choices are to provide sufficient
> supplemental information, or to just decide to not use 10646 because it
> is inadequate and wait for something better to come along.

So, let's use it with enouugh profiling information. But, please don't
try to make the issue unnecessarily complex.

						Masataka Ohta