[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character Variant Deployment at VeriSign

James Seng <jseng@xxxxxxxxxxxx> wrote:

> > When you say "variants", do you mean only the names that the
> > registrant has control over, or also the names that the registry
> > blocks?
> I meant both.  When the variants is reserved for the registrant, the
> registrant will be the only one able to "register" or "deregister" it.
> In the JET document, we call this "activation" and "deactivation".

But is it possible for a label to be blocked by multiple bundles, so
that no one can register/activate it?  This is important if the variant
relation is not transitive, as some people (like Roozbeh) have argued
that it needs to be.

Suppose that labels X and Y are confusingly similar, and therefore
cannot belong to different bundles.  Suppose Y and Z are also
confusingly similar, and cannot belong to different bundles.  But X
and Z are distinguishable, and can belong to different bundles.  (The
example that Roozbeh gave is X = U+0649, Z = U+064A, Y = U+06CC.)

There are several ways these labels could be organized into bundles:

(0) X, Y, Z all available
(1) X in bundle1, Y blocked (but available to bundle1), Z available
(2) Z in bundle2, Y blocked (but available to bundle2), X available
(3) Y & X in bundle1, Z blocked (but available to bundle1)
(4) Y & Z in bundle2, X blocked (but available to bundle2)
(5) X & Y in bundle1, Z blocked (completely)
(6) Z & Y in bundle2, X blocked (completely)
(7) X in bundle1, Z in bundle2, Y blocked (completely)
(8) Y & X & Z all in the same bundle

Consider the following progression:

Initially we are in state 0.
Someone registers X; now we are in state 1.
Someone else registers Z; now we are in state 7.
The first person unregisters X; now we are in state 2.

Notice that Y is blocked in states 1, 7, and 2, but we can't ever say
that Y belongs to any particular bundle.  If we were to imagine that
the registration of X causes Y to become an inactive member of bundle1,
then we would have to imagine that the deletion of bundle1 leaves Y
available.  But it doesn't, because by the time bundle1 is deleted, Y
is already being blocked by bundle2.  In state 7, Y is blocked by both
bundles, so that neither of them can have it.

If the variant relation is able to be non-transitive, the zone needs to
be able to block labels that do not belong to any particular bundle.
(Or equivalently, you could imagine that inactive labels can belong to
multiple bundles, and only labels belonging to a single bundle can be
activated.  But registrants might find that terminology misleading.
"It's in my bundle, but I can't use it?")

The registry might also have other reasons to block labels that do not
belong to any bundle.  In this example, the registry might prohibit
state 8, because it considers that bundle to be too big, containing
multiple names (X,Z) that ought to be registered separately.  That means
states 3 and 4 are nonsense, and states 5 and 6 are possible.  In state
5 for example, Z is blocked by only one bundle, but it is not part of
that bundle, and cannot be added to it.  Z is blocked because it's too
similar to one member of the bundle (Y), but it's forbidden from being
added to the bundle because it's too far from another member of the
bundle (X).

> > You don't necessarily have to store the blocked names.  Checking
> > whether a new name is blocked might be equivalent to a pattern-match
> > against the set of already-registered names.
> Sure you can :-) But is it feasible if you have, say 1M names in your
> zone file? I think the checking would be pretty intensive.

You can either store all the blocked names in a giant simple data
structure on which you only need to perform exact matches, or you can
store only the names that actually appear in the zone file in a much
smaller but more clever data structure on which you need to perform
pattern matches.  It's not obvious which is more feasible, but my hunch
is that the latter approach will be more scalable.

Martin Duerst already pointed out that even a very little bit of
cleverness (keeping the list sorted) can yield big improvements in the
speed of pattern matches.  I suspect that using a trie will speed it up
even more, and there are probably even more sophisticated indexing data
structures that will make it even faster.

(One idea I just had is to use two tries, one containing all the labels
in the zone, the other containing the string-reversals of all the labels
in the zone.  When you want to match a pattern against the zone, start
two matches in parallel, one on the normal trie, and one using the
reversed pattern on the reversed trie.  Both will eventually return
the same answer, but either one might run much longer than the other,
depending on the zone data.  As soon as either one completes, you can
abort the other.)