[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comparison of hoffman-idn-reg and jseng-idn-admin

At 9:56 PM +0430 3/31/03, Roozbeh Pournader wrote:
   A registry MUST NOT blindly combine multiple tables which have
   overlapping equivalences. Instead, the registry MUST carefully analyze
   every instance in the combined table where a base character has one or
   more different variants and select the desired set of variants for the
   base character.

(But unfortunately doesn't suggest any guidelines when doing so.)

Correct. I think that if I give a few suggestions for guidelines, it will lead readers to think that the problem is simple, which it is not. Either we list lots and lots, or none.

I will add a note about why I have done none; see below.

Unfortunately, the list ends here. Specifically, there are fetures that
are *required* for Arabic but are missing in the language of the tables.

And therefore it will be added. (Note to the list: I doubt that Arabic is the only language that I missed. If you know of others, please speak up!)

1. Mandatory equivalences as opposed to secondary/variant equivalences. This feature is necessary for defining equivalences between European and
Arabic-Indic digit shapes in Arabic labels, for example.

Very good point! This is a registry-specific early mapping step that must be done. I think it should be done before the variants are checked in the table; do folks here agree?

2. Clear language about conflict resolution. There needs to be some clear
guidelines or recommendations about the times that two registered labels
come into an intersection regarding the variant labels associated to them.
This will happen with almost any multi-language Arabic-script zone
(e.g. U+0649 vs U+064A vs U+06CC).

I am unclear on how this differs from point #1. If any of those three characters are supposed to only be represented by one of them in names, then the registry-specific early mapping step will take care of them. Or is that not what you are referring to? Please be more specific.

3. Clear language with specific guidelines and real-life examples for
merging tables for different languages/locales.

Currently, I believe that there are three possibilities:

- the merging is trivially easy because there is no overlap

- the merging is a policy decision by the registry at the time of table-making as to which language "wins" for the overlapping characters

- it is impossible to register without knowing the supposed language of the registration

I can add more discussion of that, but the third option is not "merging", it is forcing the problem on the registrant (who might be sly and use it as a way to make the bundle contain things that the registry might not have intended). From my reading of the JET document, they call the third option "merging" when in fact it is just the opposite: it prevents merging by pointing at one table.

4. Better syntax for the table. Don't you agree that a U+ABCDU+BCDAU+CDAB
syntax is unreadable? Why can't one use a space?

Spaces as separators in tables cause problems going through gateway programs. I'm happy to add an inter-character separator of "-".

--Paul Hoffman, Director
--Internet Mail Consortium