[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Comparison of hoffman-idn-reg and jseng-idn-admin

At 07:59 03/04/02 -0800, Paul Hoffman / IMC wrote:

At 4:52 PM +0200 4/2/03, Benny Lipsicas wrote:
The language in question is Hebrew. One feature that may be of
importance to us is the ability to prevent certain characters from
appearing anywhere else but at the end of the label (i.e. it can only be
the last char of a label), and we have another issue, which I'm not
certain is in the scope of this list, any label in Hebrew needs to be
written RTL, and if i'm not mistaken, this technically prevents the
mixing of Hebrew and non-Hebrew chars in the same label.

The latter issue is definitely handled by the IDNA standard. Could you explain the reason for the first issue (that a particular character has to be the last character in the label)?

Some Hebrew characters (kaf, mem, nun, peh, tsadi) have different forms when appearing at the end of a word (label). The Greek sigma is another example. In Unicode, this is handled by having separate codepoints for the final forms. This is in contrast to e.g. Arabic, where there are much more contextual forms, and shaping is handled on display, and there is only one codepoint per character.

So a registry registring Hebrew would want to make sure that e.g.
a kaf in the middle of a label is always U+05DB, but at the end of
the label is always U+05DA. I'm not sure what should happen with
labels that consist of more than one word, whether simple
concatenation would be acceptable (and a final letter could help
seeing the word boundary) or whether a hyphen or other, similar
character would be used to concatenate words.

Regards, Martin.