[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: confusability

> I don't know the actual number of additional characters added thus
> far, but the upward limit is 65,535. So, as I see it, you will have

A small correction: there are currently over 95,000 characters in Unicode
3.2; in Unicode 4.0 (very soon to be released) there will be an additional
thousand-odd characters. In addition, there are 131,068 possible private use
characters, and there are 871,758 reserved positions still available for
future characters.

IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "tedd" <tedd@xxxxxxxxxxxx>
To: "IDN registration policy list" <idn-reg-policy@xxxxxxx>
Sent: Saturday, April 05, 2003 07:13
Subject: Re: confusability

> >tedd <tedd@xxxxxxxxxxxx> wrote:
> >
> >  > > For the moment I'll call the relation "confusability".  Given any
> >>  > two labels (in no particular order), they are either confusable or
> >>  > not, and it is possible to compute that boolean value.
> >>
> >>  From an earlier post, someone talked about IBM.com vs 1BM.com -- which
> >>  should have been ibm.com vs 1bm.com, but none the less this type of
> >>  similar-looking-glyph use can be confusing.  It can be even more
> >>  confusing if one uses a Greek small letter iota with tonos (U03AF) to
> >>  produce an ibm.com.  Is this the type of confusion you are talking
> >>  about?
> >
> >Could be.  A registry would define its confusability relation as it
> >sees fit.  It doesn't want to define confusability so narrowly that
> >not enough things are considered confusable, because then it would be
> >swamped by disputes about name ownership.  But it doesn't want to define
> >confusability so broadly that it drastically curtails the number of
> >registrations (and hence revenue).
> >
> >Maybe "confusable" is not the best term.  Maybe "neighboring" would be
> >better.  It's got some of the right intuition:  If you are my neighbor,
> >then I am your neighbor (symmetry), but my neighbor's neighbor is
> >not necessarily my neighbor (intransitivity).  You can speak of the
> >neighborhood centered around a particular label.  Neighborhoods centered
> >around different labels can partially overlap.  A bundle would be either
> >a set of labels that are all neighbors of each other, or a subset of the
> >neighborhood centered around the bundle's primary label, depending on
> >which version of property 2 we use.  Property 1 says that neighboring
> >labels in a zone must not belong to distinct bundles.
> >
> >I just noticed that I forgot to state an assumption, which we can call
> >property 0:  Every label in a zone belongs to exactly one bundle.
> >
> >AMC
> AMC:
> I understand -- but, I cannot see how the "confusability" avoidance
> issue can be implemented to the entire Unicode database.
> It appears to me (perhaps I'm wrong) that this group is trying to
> predict and solve all possible problems that may arise from IDN
> registrations because of look-alike possibilities within the Unicode
> database.
> I don't know the actual number of additional characters added thus
> far, but the upward limit is 65,535. So, as I see it, you will have
> some 65,000 different possibilities of character confusion at a
> single character domain level (i.e., a.com). Now, move to two
> characters (aa.com) and figure becomes much larger -- something in
> the order of 65000 x 65000 range.
> Now, what's the upper limit to the number of characters allowed in a
> domain name and what's it's factorial?  Do you honestly believe that
> you can solve this confusability problem for all possible
> combinations -- even if your interpretation is the correct one for
> each situation? Be reasonable, you're approaching a number that
> rivals the US national debt. Plus, no offense, you're making
> decisions about glyphs in other languages that are not you're own.
> I think this group has made some significant progress in that some
> characters have been already mapped to others -- such as all
> occurrences of glyphs looking like "A" and have been mapped to "a"
> and so on. But,you have done that primarily because you are familiar
> with the Latin character set and it's use.
> Now, to map all occurrences of everything that looks similar to one
> character may do more harm than good in ways not apparent to you
> presently. Plus, considering the shear number of combinations and
> thoughtful considerations required for each one -- I don't think this
> group has enough time nor resources to accomplish the task.
> It might be best, for all concerned, to let the market and courts work it
> tedd
> --
> http://sperling.com/