[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: My prod at IDN requirements



At 11:04 04.01.00 +0800, James Seng wrote:

I think we need to properly defined 3 case.

I18N of Domain Names as represented on the client.
I18N of Domain Names as represented in DNS packet.
I18N of Domain Names as represented as DNS record/zones.

They may be the same, or they may not be. We do not know.

I think "I18N of DNs as represented in DNS packets" is part of the implementation, not the requirements - if there are requirements that constrain the solution, we need to list them before we decide on the solution.


Doubly so for the zonefile formats.

comments on some comments:


>    i18c in a name field of a Response or in content of a RR must be
> uniquely identifiable as such YES/NO

this is sort of related to the matching problem. but i think yes.

this is where we decide between solutions that say "ship binary gunk and let the recipient decide whether it's charset<x>, IPaddress, key or Something Else" and "ship special label that means this is i18c".


> it must be possible to DNSSEC sign i18c records DNS server to client YES/NO

yes. we should not change the existing dns system.

if we sign them, it means no conversions in intermediate resolvers.



> More in the solution space:
>
>    iso 10646 characters will be enough forever for DNS purposes YES/NO

UCS-4 should cover all languages including all variation in time to come.
However, it also have a lot of problems, including the fact that it changes
from time to time :P

UCS-4 is 1 representation - UTF-8 and UTF-16 are representations of the same charset. They have promised (ugh) that it is now only growing, not changing.



> a single representation for i18c must be chosen YES/NO

maybe? i think different proposals will have answer to this. i think we should
leave it open, and not limit to only iso10646 or some other encodings.


> For matching records, Choose One: > > it matters whether matching is consistent across all servers > it doesn't matter whether matching is consistent across all servers

I think obviously we need to make sure matching is consistent across all
servers.

> i18c Cyrillic A must compare equal to Latin A
> i18c Cyrillic A must compare not equal to Latin A
> i18c A with Ring Above must compare equal to a with ring above
> i18c A with Ring Above must compare not equal to a with ring above
> i18c ASCII A must compare equal to a
> i18c ASCII A must compare not equal to a
> i18c A + COMBINING RING ABOVE must compare equal to A with Ring Above
> i18c A + COMBINING RING ABOVE must not compare equal to A with Ring Above


case-folding is not a simple problem, even for european languages as it may
varies on context. http://www.unicode.org/unicode/reports/tr21/ is a good
report on case mapping problem, at least for european languages.

> Others are MUCH better than me in compiling example cases and requirements
> for Korean, Japanese, Thai, Arabic, Hebrew.....

in addition, there are also languages which have other problem on folding.
chinese for example have simplified & traditional glyphs which means the same
thing, use in the same way but given different codespace.

japanese kanji also have traditional & simplified glyphs but it is usually
considered differently. or at least that is what i have been told.

this will be a problem if ISO10646 is used. because of the CJK unification
(arggh who is the idiot?), japanese & chinese falls under the same U+4E00 code space. if one folds and the other not, i think it is fairly obvious how messy it is going to be.

Is this a fact or a "maybe a problem"?
I think we need to be as specific as possible here....for each folding problem, name a glyph that has the problem, if possible.


korean hangul if i am not wrong does not suffer from this problem :-) it is a
very clean and well-designed language.

perhaps we should all write Korean, then :-)


Harald

--
Harald Tveit Alvestrand, EDB Maxware, Norway
Harald.Alvestrand@xxxxxxxxxxxxxx