[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New Internet Draft on registering IDNs



On Tue, Mar 25, 2003 at 09:31:09AM -0800,
 Paul Hoffman / IMC <phoffman@xxxxxxx> wrote 
 a message of 16 lines which said:

> Greetings. I have just submitted a new Internet Draft that gives 
> suggestions on how to register IDNs. 

There is a big problem with the bundle approach described in this
draft, a problem I discovered while trying to implement it. I designed
a table, as specified in the draft, for the French language (for
multi-lingual countries or for multinational domains like '.eu', the
problem will be worse). My table may be too large, allowing all
Latin-1 (ISO-8859-1) characters, even those not used in French but it
is still too small for the entire European Union.

With this table, I experience a dramatic explosion. Many domains
generate a bundle of several thousands of names, sometimes more. If I
implement Option 1 of the draft ("Allocate all labels to the same
registrant, making the zone information identical to that of the input
label."), my zone file will explode :-)

The problem, as I see it, is that the draft uses variant tables which
work on a per-character basis, without any concern that not all
combinations mean something. Most of the words in a bundle have no
meaning in French and there is no real reason to keep them.

I understand that there is no easy option (using a dictionary will not
work since many domain names are not in any dictionary).

Did anyone try a bundle approach on his zone?
 
Here is the table, for those interested. It is simply
"accent-insensitive". I regard any composed character as a variant of
the plain character.

# Variant table for the French language
# See Internet-Draft draft-hoffman-idn-reg-00
#
# Designed at AFNIC
# Stephane Bortzmeyer <bortzmeyer@xxxxxx>
# $Id$

# a-z
# a
U+0061|U+00E0:U+00E1:U+00E2:U+00E3:U+00E4:U+00E5
U+0062
U+0063
U+0064
# e
U+0065|U+00E8:U+00E9:U+00EA:U+00EB
U+0066
U+0067
U+0068
# i
U+0069|U+00EC:U+00ED:U+00EE:U+00EF
U+006A
U+006B
U+006C
U+006D
U+006E
# o
U+006F|U+00F2:U+00F3:U+00F4:U+00F5:U+00F6:U+00F8
U+0070
U+0071
U+0072
U+0073
U+0074
# u
U+0075|U+00F9:U+00FA:U+00FB:U+00FC
U+0076
U+0077
U+0078
U+0079
U+007A
# 0-9
U+0030
U+0031
U+0032
U+0033
U+0034
U+0035
U+0036
U+0037
U+0038
U+0039
# - (hyphen)
U+002D
# Ligature oe
U+0153|U+006FU+0065