[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: model with overlapping variants



Martin Duerst <duerst@xxxxxx> wrote:

> The main thing I'm worried about is what happens if X or Y (or both)
> are strings rather than characters.  There may be some weird cases
> where the 'longest prefix' match somehow destroys the symmetry.

Could be.  If so, we could force the symmetry by defining the relation
this way:  X and Y are neighbors iff table(X) matches Y or table(Y)
matches X, where table() and "matches" are yet-to-be-defined operations
that might be asymmetric.

> > For the moment I'll call the relation "confusability".
>
> What about "similarity"?

I think I like "neighbors" better.  "Similarity" sounds like a
real-valued function (how similar are X and Y?), while "neighbors"
sounds more boolean (are X and Y are neighbors?).

> > 2a looks difficult to enforce, and might lead to bundles that are
> > too large, so let's put that one aside for now.
>
> If we assume that registrants try to grab as much as possible, then
> the bundles could become very large.  In fact, if everybody would try
> to grab as many entries as possible into a bundle, we would end up
> with an equivalence relation.

If the registrants performed their grabbing sequentially (each bundle is
maximally expanded before the next bundle is created), then we'd end up
with an equivalence relation.  If the bundles expand concurrently, then
the equivalence classes would not form, because labels destined to be in
the same class could start off in separate bundles, and bundles never
merge.

> Using ~ to express confusability, 2c gives us situations where we can
> have X~Y and Y~Z, which we could call a confusability chain of length
> 3.  The question is whether there are longer confusability chains in
> practice.

Character-chains of length 2 can give rise to string-chains of greater
length.  For example, suppose we have characters related as follows:

    A~B B~C
    X~Y Y~Z

Now suppose a zone contains AX, BX, CY, CZ.  The shortest path (using
labels in the zone) from AX to CZ goes through all four labels.  There
is a shorter path involving the label BY, but BY might not exist in the
zone.

> One way to visualize this is that there would be a one layer thick
> 'neutral zone' between any two bundles that might be transitively
> confusable.

Right, any labels that neighbor multiple bundles are unavailable to
everyone.  (A label neighbors a bundle if it neighbors any label in the
bundle.)

We need to remember that we're speaking in the context of a model where
bundles contain only labels that are in the zone, not any "blocked"
labels.  Maybe I shouldn't have used the term "bundle" for this purpose,
since Paul introduced it to include blocked labels.

> > When a registrant asks to remove a label from their existing bundle,
> > the request is denied if it is the only(2b)/primary(2c) label in the
> > bundle.
>
> The 'only(2b)' case seems irrelevant because of what you say in the
> next point.
>
> > A request by a registrant to remove one of their entire bundles is
> > never denied.

It is relevant because it means that bundles cannot be empty.  A
registrant might imagine that they could remove the last label from
a bundle, and still have an empty bundle until it expires normally,
and before then they could add a label back into the bundle.  But I
want to assert that this is not allowed, because it could be used to
disguise the registration of a completely new & different name as a mere
expansion of an existing bundle (the fees for those two operations are
likely to be different).

> I think it's easy to make matching a regular expression against an
> ordered zone file much more efficient than matching that regular
> expression against each of the entries in the zone in turn.

Thanks for the illustration.  I didn't realize that so much could be
gained with such a simple data structure (a sorted list).  I think
there are still pathological cases where the time spent backtracking is
exponential, but maybe a more sophisticated data structure could help
with that (if needed).

AMC