[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: confusability

tedd <tedd@xxxxxxxxxxxx> wrote:

> > For the moment I'll call the relation "confusability". Given any
 > two labels (in no particular order), they are either confusable or
 > not, and it is possible to compute that boolean value.

 From an earlier post, someone talked about IBM.com vs 1BM.com -- which
 should have been ibm.com vs 1bm.com, but none the less this type of
 similar-looking-glyph use can be confusing.  It can be even more
 confusing if one uses a Greek small letter iota with tonos (U03AF) to
 produce an ibm.com.  Is this the type of confusion you are talking

Could be. A registry would define its confusability relation as it sees fit. It doesn't want to define confusability so narrowly that not enough things are considered confusable, because then it would be swamped by disputes about name ownership. But it doesn't want to define confusability so broadly that it drastically curtails the number of registrations (and hence revenue).

Maybe "confusable" is not the best term.  Maybe "neighboring" would be
better.  It's got some of the right intuition:  If you are my neighbor,
then I am your neighbor (symmetry), but my neighbor's neighbor is
not necessarily my neighbor (intransitivity).  You can speak of the
neighborhood centered around a particular label.  Neighborhoods centered
around different labels can partially overlap.  A bundle would be either
a set of labels that are all neighbors of each other, or a subset of the
neighborhood centered around the bundle's primary label, depending on
which version of property 2 we use.  Property 1 says that neighboring
labels in a zone must not belong to distinct bundles.

I just noticed that I forgot to state an assumption, which we can call
property 0:  Every label in a zone belongs to exactly one bundle.



I understand -- but, I cannot see how the "confusability" avoidance issue can be implemented to the entire Unicode database.

It appears to me (perhaps I'm wrong) that this group is trying to predict and solve all possible problems that may arise from IDN registrations because of look-alike possibilities within the Unicode database.

I don't know the actual number of additional characters added thus far, but the upward limit is 65,535. So, as I see it, you will have some 65,000 different possibilities of character confusion at a single character domain level (i.e., a.com). Now, move to two characters (aa.com) and figure becomes much larger -- something in the order of 65000 x 65000 range.

Now, what's the upper limit to the number of characters allowed in a domain name and what's it's factorial? Do you honestly believe that you can solve this confusability problem for all possible combinations -- even if your interpretation is the correct one for each situation? Be reasonable, you're approaching a number that rivals the US national debt. Plus, no offense, you're making decisions about glyphs in other languages that are not you're own.

I think this group has made some significant progress in that some characters have been already mapped to others -- such as all occurrences of glyphs looking like "A" and have been mapped to "a" and so on. But,you have done that primarily because you are familiar with the Latin character set and it's use.

Now, to map all occurrences of everything that looks similar to one character may do more harm than good in ways not apparent to you presently. Plus, considering the shear number of combinations and thoughtful considerations required for each one -- I don't think this group has enough time nor resources to accomplish the task.

It might be best, for all concerned, to let the market and courts work it out.