[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bidi: is stringprep broken?
According to my understanding, and to testing against the Unicode C
reference implementation, you are correct in stating that the 2 strings ("A-123,456B" and "A456,-123B") will give the same display according to the Unicode algorithm for
It proves that you have a more creative mind than the people who proposed
the limitations for Bidi names in IRIs, at least more than mine.
You will admit that your example is more than a little contrived. The
limitations set on IRIs intend to avoid ambiguity when converting from the
display order to the logical order (which in this case is not achieved,
although the vast majority of users would assume form A-123,456B, because the other form with the comma adjacent to a minus sign makes
little sense in a domain name). But those limitations were also designed
not to restrict too much the potential of creating interesting domain
names, so a compromise had to be achieved. I can find other examples of
names allowed by the rules which can mislead users trying to induce the
logical order based on the display order. All of these examples are quite
By the way, can you give a reference to "UseSTD13ASCIIRules", for an ignoramus like myself?
Shalom (Regards), Mati
Globalization Center Of Competency - Bidirectional Scripts
Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52
Sent by: public-iri-request@xxxxxx
To: ietf-imaa@xxxxxxx, public-iri@xxxxxx
Subject: Bidi: is stringprep broken?
> > Ergo, we need another display model; this one doesn't work
> There are also other real nasties with this display model:
Worse than that, I think the bidi restrictions in stringprep don't
actually achieve their goal of ensuring that you can't have two
different labels that render the same.
Consider the labels:
Here, A is HEBREW LETTER ALEF, B is HEBREW LETTER BET (or any
characters of bidi class R that you like, but *not* arabic letters,
which are class AL) and the comma is actually ARABIC COMMA U+060C (or
any character of class CS or ES).
As far as I can tell these both pass nameprep with UseSTD13ASCIIRules
set, and they both render identically under bidi as:
If you don't care about UseSTD13ASCIIRules, you can replace
ARABIC COMMA with COMMA, SOLIDUS or COLON.
I fully expect someone to reply explaining why I'm mistaken, but I've
checked the above as best I can...