[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bidi issues
Adam provides a pretty good explanation. We have also adopted
very much the same rules for IRIs (which contain domain names).
For the current (internal) draft, see
and please also have a look at the examples at
Some more details below.
[I have copied public-iri@xxxxxx because I'm mentioning the IRI solution]
At 02:41 03/08/07 +0000, Adam M. Costello wrote:
Roy Badami <roy@xxxxxxxxxxxxx> wrote:
As for IMAA, I have no doubt that some sort of bidi check is needed for
the local part, for the same reasons it is needed for domain labels.
And I have no doubt that the bidi check in Stringprep is sufficient and
overkill, just as it is for domain labels. The only difference is how
much overkill. Consider the address foo.bar@xxxxxxxxxxxx Stringprep
is applied to "example" and to "net" independently, but it is applied
to "foo.bar" all together. Therefore there might exist strings that
would be valid domain names but not valid local parts.
Yes, in particular if e.g. foo is Arabic and bar is Latin, or so.
Sounds like a rather strict restriction to me that there should be
absolutely no way to fit these two alphabets into the local part.
But I guess Roy and others can judge better how much pain that
will be in practice.
I'm not sure--it
depends on how ASCII dots influence the bidi algorithm.
No, it doesn't depend on the bidi algorithm, just on stringprep.
But that might be a good thing. The user interface might understand
that example.net is a domain name composed of labels, and would be able
to override the bidi algorithm if necessary to preserve the proper order
of the labels. But the local part is an opaque string (except possibly
when viewed by the mail exchanger for example.net) and is therefore
fully subject to the bidi algorithm, and therefore needs the protection
of having Stringprep's bidi check applied to the whole thing.
Please note that currently, IDNA does not specify how to display
domain names. IMAA may do that, but I don't think it does.
For IRIs, we have decided that the only context an application
needs is to know is that overall, this is an identifier. This
simplifies implementation quite a bit. In many cases, in particular
for all-rtl cases, this knowledge is not even necessary, and even
if an IRI is not embedded in explicit LTR context, the single
missing bit still can be guessed easily.
This is based on feedback we have received both from Israelis
and from Arabs. For example, whenever Arabs have presented
Arabic domain names, they always did it as MOC.BARA.BEW
(inverting not only each component of web.arab.com, but
also the order of the labels) rather than BEW.BARA.MOC
(just inverting the labels internally).
The solution makes it easier for people without any specific
knowledge of IRI/mail address/domain name syntax to read
these things in the right order.
So the proposed display algorithm for IRIs doesn't need any
internal knowledge of the IRI (or email address, or domain
name) structure. So there is no need to apply the bidi check
to the whole left hand part if that's not deemed appropriate.