[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bidi issues



Roy Badami <roy@xxxxxxxxxxxxx> wrote:

> I'm trying to understand the rationale behind the bidi restrictions in
> stringprep, but unfortunately the IDN list archives don't seem to be
> accesible any longer.

One message at a time, going back to 2000-Jan:

http://www.imc.org/idn/mail-archive/

One month at a time, going back to 2002-Jan:

ftp://ops.ietf.org/pub/lists/idn.*

Hmmm, anyone know why the latter doesn't go back as far as the former?

> Basically, I'm just wondering whether bidi behaves sensibly in the
> context of IMAA.  (Or indeed whether the stringprep restrictions are
> unnecessary in IMAA.)

Good question, I hadn't really thought about that.

The motivation behind the bidi restrictions is that there exist very
different strings that get displayed exactly the same way.  I don't know
exactly why, but it's a consequence of the bidi algorithm.  You can try
reading UAX#9 if you like.  Good luck.  :)

http://www.unicode.org/reports/tr9/

Anyway, having distinct identifiers that get displayed exactly the
same way is undesirable, so Stringprep provides a way to prohibit the
problematic strings, and Nameprep uses it.

The bidi check defined in Stringprep is actually a little more
restrictive than necessary.  It prohibits not only the problematic
strings, but also some strings that would not have caused problems.
This was done for the sake of simplifying the check.  I trust that the
bidi experts who designed the check understood the tradeoffs and made
the right call.

As for IMAA, I have no doubt that some sort of bidi check is needed for
the local part, for the same reasons it is needed for domain labels.
And I have no doubt that the bidi check in Stringprep is sufficient and
overkill, just as it is for domain labels.  The only difference is how
much overkill.  Consider the address foo.bar@xxxxxxxxxxxx  Stringprep
is applied to "example" and to "net" independently, but it is applied
to "foo.bar" all together.  Therefore there might exist strings that
would be valid domain names but not valid local parts.  I'm not sure--it
depends on how ASCII dots influence the bidi algorithm.

But that might be a good thing.  The user interface might understand
that example.net is a domain name composed of labels, and would be able
to override the bidi algorithm if necessary to preserve the proper order
of the labels.  But the local part is an opaque string (except possibly
when viewed by the mail exchanger for example.net) and is therefore
fully subject to the bidi algorithm, and therefore needs the protection
of having Stringprep's bidi check applied to the whole thing.

That's probably more than enough speculation from someone who doesn't
even know the bidi algorithm.

AMC