[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: model with overlapping variants



Mark Davis <mark.davis@xxxxxxxxx> wrote:

> 1. Any equivalence relation on strings corresponds to a partition
> of all strings.  By far the most efficient matching of strings is
> where one member of each partition is chosen as the 'skeleton' or
> representative for that partition.  All you need is an efficient
> mapping function toSkeleton() that takes each member of each partition
> onto its skeleton.  The toSkeleton() function can be any idempotent
> string function.

That was my original proposal, but it has been suggested that
equivalence relations are not general enough to express the sort of
policies that registries want to express, and that an intransitive
relation is needed.

> 2. One can generalize the process in #1, if there are multiple
> equivalence relations and some kind of registration process, as long
> as the registrations are serial.  For example, one could support both
> German and French equivalence relations. It's a bit more work:
>
> You have to keep a set of skeletons for all registrations, for each
> equivalence relation, e.g.
> FrenchSkeletonSet
> GermanSkeletonSet
> SwedishSkeletonSet
> ...
> 
> What you do is when each name N is registered, you have to say which
> equivalence relation is to be used with it (e.g. German).  To make
> sure that N doesn't collide with a pre-existing registered name, look
> N up against each of the above sets using *that* set's isSkeleton
> function.  If it ever collides, don't register it.  If it doesn't
> collide in any of the sets, add its skeleton to the appropriate
> Skeletons set (e.g. GermanSkeletonSet).
>
> To lookup an incoming string, probe each of the skeleton sets
> according to the set's isSkeleton function until you find a match.
> Since these are equivalence relations, the order of lookup in the sets
> doesn't matter.

I'm not convinced, but maybe I'm not understanding.  Suppose toSkeleton1
creates this view of the namespace:

    +-------+-------+
    |       |       |
    |       |       |
    |       |       |
    |       |       |
    |       |       |
    |       |       |
    |       |       |
    +-------+-------+

That is, the entire namespace consists of two equivalence classes.  Now
suppose toSkeleton2 creates this view of the namespace:

    +---------------+
    |               |
    |               |
    |               |
    +---------------+
    |               |
    |               |
    |               |
    +---------------+

Now suppose the following two names are registered:

    +---------------+
    |               |
    |           1   |
    |               |
    |               |
    |               |
    |   2           |
    |               |
    +---------------+

The digits denote which version of toSkeleton the name was registered
under.  These two names do not collide with each other, no matter which
toSkeleton you ask.

Now let's try to do a lookup on this name:

    +---------------+
    |               |
    |               |
    |               |
    |               |
    |               |
    |           *   |
    |               |
    +---------------+

toSkeleton1 will say it matches name 1, and toSkeleton2 will say it
matches name 2.

AMC