[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bidi issues



Adam M. Costello scripsit:

> The motivation behind the bidi restrictions is that there exist very
> different strings that get displayed exactly the same way.  I don't know
> exactly why, but it's a consequence of the bidi algorithm.  You can try
> reading UAX#9 if you like.  Good luck.  :)

Here's an example.  I will use UPPER CASE to represent Arabic letters
and lower case to represent Latin letters, as is usual in examples of
this kind.  If you see, totally out of context, the string

	the arabs = BARA-LA

you cannot tell whether this says "the arabs = AL-ARAB", as would be the
case in an English context, or "AL-ARAB = the arabs", as would be the case
in an Arabic context.  In running text, it's possible to disambiguate,
but not in an identifier that has to work correctly out of context.

Consequently, the stringprep rules forbid a identifier that contains
both LTR and RTL characters.  It really has nothing to do with the encoding
of the characters, only with their appearance.

-- 
With techies, I've generally found              John Cowan
If your arguments lose the first round          http://www.reutershealth.com
    Make it rhyme, make it scan                 http://www.ccil.org/~cowan
    Then you generally can                      jcowan@xxxxxxxxxxxxxxxxx
Make the same stupid point seem profound!           --Jonathan Robie