[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Preparation of Internationalized Host Names - Hebrew



At 12:43 PM +0300 7/8/00, Jonathan Rosenne wrote:
> Please note that not all punctuation is prohibited. The rules for the
specific kinds of punctuation that is prohibited are in the document.
> U+05C0, which looks just like the ASCII "vertical bar", is probably
 acceptable (since vertical bar is acceptable). U+05C3 looks just like
 a colon and is therefore not acceptable; thanks for pointing this
 out. (And I have noted it to the Unicode folks for when they update
 the standard).

Its meaning is punctuation, like comma or full stop, never mind its shape.

Exactly my point. At present, we do *not* prohibit all punctuation. The only prohibited punctuation are characters are that are reserved or delimiters in URLs [RFC2396] and [RFC2732]. If this group decides to prohibit all punctuation, certainly we would then prohibit U+05C0. Or, we might prohibit all punctuation other than a certain small group of characters (which would be pretty difficult to choose correctly...). But, for now, we only prohibit a small set.


 > >2. Cantillation Marks
 > >0591 to 05af
 > >
 > >These should be either prohibited or ignored since they do not affect
 >pronunciation, similar to ignoring case differences.
 >
 >Personally, I would rather prohibit them since their presence is
 most likely
 >to be an error.

 If they never appear in personal names, company names, or spoken
 phrases, then they can safely be prohibited. Is that true for all of
 them?

They never appear in common use, they are only used in biblical texts.

Thanks, that's what I wanted to hear. I'll prohibit them in the next draft.


> >2. Points
 >05b0 to 05c4
 >
 >These should be either prohibited or ignored since they are optional. In
 >modern Hebrew they are seldom used, not all systems support
 them, and it is
 >valid to omit them.
 >
 >Personally, I would rather ignore them because a user may enter
 them and why
 >not let him.

 This is much more problematic. We do not currently have any "ignored"
 characters. If I understand this correctly, the host name <HEBREW
 LETTER HE><HEBREW POINT SEGOL>.com looks and sounds different than
 <HEBREW LETTER HE><HEBREW POINT TSERE>.com, but could be considered
 the same for a host name. If so, I think we would have to prohibit
 them, not ignore them. Does that sound correct?

They do sound different, but do not necessarily look different because it is not mandatory to display points.

Just like you ignore case in English, in Hebrew you should ignore points.

From my (very limited) understanding of Hebrew, this makes sense. However, it means that we will have to make such other "ignoring" rules for a variety of scripts. I'm happy to do that if the group wants, but it certainly makes the name preparation harder. (Just to be clear: my personal preference would have been not to ignore case, but that decision was made *long* ago and cannot be reversed.) Doing so would require an extra step, probably between checking for prohibited characters and folding case, that says "look for any characters on this list and throw it away".


How does the group feel about this? What other characters in scripts other than Hebrew would go here?

--Paul Hoffman, Director
--Internet Mail Consortium