[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comparison of hoffman-idn-reg and jseng-idn-admin



Stephane,

This is one of the reasons why I dont think a generic algorithm with a
single table format can be designed to handle *all* languages[1].

I believe in a more strip down generic framework on handling of IDNs and its
variants (e.g. registration, deletion, transfer, etc). But the exact
algorithm to generate the variants should be defined per language.

[1] language as defined by RFC 3066.

-James Seng

----- Original Message -----
From: "Stephane Bortzmeyer" <bortzmeyer@xxxxxx>
To: "Martin Duerst" <duerst@xxxxxx>
Cc: "IDN registration policy list" <idn-reg-policy@xxxxxxx>
Sent: Thursday, April 03, 2003 4:20 PM
Subject: Re: Comparison of hoffman-idn-reg and jseng-idn-admin


>
> On Wed, Apr 02, 2003 at 03:07:03PM -0500,
>  Martin Duerst <duerst@xxxxxx> wrote
>  a message of 28 lines which said:
>
> > The danger of bundles being too big can easily happen for European
> > languages, with a bundle that defines that all accented versions of
> > a character are treated as the same as the base character.
>
> Yes, see my previous message, in the thread "New Internet Draft on
> registering IDNs". A typical example is the label
> "3suisses-assurances" (which actually exist in '.fr') which has a
> bundle of 306,250 labels with a table that uses (almost) all the
> Latin-1 characters.
>
> Not all of Latin-1 characters exist in French so we could downsize the
> table and therefore the bundles. But, on the other hand, for a
> registry like '.eu', we will need an even larger table since Europe
> requires more than just Latin-1.
>
> > In that case, Paul's approach (also described by Adam) of using
> > equivalence classes won't scale.
>
> It doesn't scale if you want to actually generate the bundle and
> publish them in a static zone file. I tried for the '.fr' zone which
> is quite small - 150,000 domains - and the resulting zone file was
> larger than '.com' even before the domains starting with the letter A
> were fully processed. But you have other approaches:
>
> * a dynamic DNS server like PowerDNS <http://www.powerdns.com/> with a
> back-end that will match a label to its bundle at query-time,
>
> * Option 2 or 3 of Paul's draft, which do not require to actually
> store the complete bundle.
>
> > What may work is that an accented character blocks the base
> > character, but not characters with a different accent.
>
> Interesting. We could also draw inspiration from most Web search
> engines. They work that way: If there is no composed character in the
> query, they search "accent-insensitive". If there is at least one,
> they switch to "accent-sensitive".
>
>
>