[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
tedd <tedd@xxxxxxxxxxxx> wrote:
> > For the moment I'll call the relation "confusability". Given any
> two labels (in no particular order), they are either confusable or
> not, and it is possible to compute that boolean value.
From an earlier post, someone talked about IBM.com vs 1BM.com -- which
should have been ibm.com vs 1bm.com, but none the less this type of
similar-looking-glyph use can be confusing. It can be even more
confusing if one uses a Greek small letter iota with tonos (U03AF) to
produce an ibm.com. Is this the type of confusion you are talking
Could be. A registry would define its confusability relation as it
sees fit. It doesn't want to define confusability so narrowly that
not enough things are considered confusable, because then it would be
swamped by disputes about name ownership. But it doesn't want to define
confusability so broadly that it drastically curtails the number of
registrations (and hence revenue).
Maybe "confusable" is not the best term. Maybe "neighboring" would be
better. It's got some of the right intuition: If you are my neighbor,
then I am your neighbor (symmetry), but my neighbor's neighbor is
not necessarily my neighbor (intransitivity). You can speak of the
neighborhood centered around a particular label. Neighborhoods centered
around different labels can partially overlap. A bundle would be either
a set of labels that are all neighbors of each other, or a subset of the
neighborhood centered around the bundle's primary label, depending on
which version of property 2 we use. Property 1 says that neighboring
labels in a zone must not belong to distinct bundles.
I just noticed that I forgot to state an assumption, which we can call
property 0: Every label in a zone belongs to exactly one bundle.
I understand -- but, I cannot see how the "confusability" avoidance
issue can be implemented to the entire Unicode database.
It appears to me (perhaps I'm wrong) that this group is trying to
predict and solve all possible problems that may arise from IDN
registrations because of look-alike possibilities within the Unicode
I don't know the actual number of additional characters added thus
far, but the upward limit is 65,535. So, as I see it, you will have
some 65,000 different possibilities of character confusion at a
single character domain level (i.e., a.com). Now, move to two
characters (aa.com) and figure becomes much larger -- something in
the order of 65000 x 65000 range.
Now, what's the upper limit to the number of characters allowed in a
domain name and what's it's factorial? Do you honestly believe that
you can solve this confusability problem for all possible
combinations -- even if your interpretation is the correct one for
each situation? Be reasonable, you're approaching a number that
rivals the US national debt. Plus, no offense, you're making
decisions about glyphs in other languages that are not you're own.
I think this group has made some significant progress in that some
characters have been already mapped to others -- such as all
occurrences of glyphs looking like "A" and have been mapped to "a"
and so on. But,you have done that primarily because you are familiar
with the Latin character set and it's use.
Now, to map all occurrences of everything that looks similar to one
character may do more harm than good in ways not apparent to you
presently. Plus, considering the shear number of combinations and
thoughtful considerations required for each one -- I don't think this
group has enough time nor resources to accomplish the task.
It might be best, for all concerned, to let the market and courts work it out.