[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: My prod at IDN requirements



--On 2000-01-04 11.04 +0800, James Seng <jseng@xxxxxxxxxxxx> wrote:

this will be a problem if ISO10646 is used. because of the CJK unification
(arggh who is the idiot?)

The BIG question is from my point of view whether CJK unification should be used here, or not (because it is bad). If it is not, can this group create something better? See though column 1 and 14 of ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt and ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt before answering... :-)


Traditionally the IETF is better at (a) Grandfathering something created elsewhere and (b) Forced that "other" body to do a good job, than doing something from scratch.

I am not saying that you should adopt everything already done, but think twice if it is not (still) better to use something that is done...

The same thing with equality and casing. Being a western-european person, only working with some inuit charsets needed in Canada, and helping some people in the far eastern library in Stockholm with chinese character sets on the mac, I of course do not know enough about these things -- BUT, I see that the UNICODE consortium have defined the decomposition rules which makes it possible to write code which does comparison of characters (not sorting). (See column 1 and 6 of ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt)

Are those rules too bad to use for equality definitions? Can we do anything better?

If those rules are used, I read the table from Harald the following way:

For matching records, Choose One:

it matters whether matching is consistent across all servers

Yes


it doesn't matter whether matching is consistent across all servers

No


   i18c Cyrillic A must compare equal to Latin A
   i18c Cyrillic A must compare not equal to Latin A

U+0410 eq U+0041?


No

   i18c A with Ring Above must compare equal to a with ring above
   i18c A with Ring Above must compare not equal to a with ring above

U+00C5 eq U+00E5?


Yes

   i18c ASCII A must compare equal to a
   i18c ASCII A must compare not equal to a

0x41 eq U+0041?


Yes

   i18c A + COMBINING RING ABOVE must compare equal to A with Ring Above
   i18c A + COMBINING RING ABOVE must not compare equal to A with Ring
   Above

(U+0041 + U+030A) eq U+00C5?


Yes


I know we were supposed to come up with requirements, but I "just" wanted to show what is possible to define given _one_ specific set of rules created somewhere else. Whether those rules are good or bad, a different question. I personaly think the questions from Harald were extremely good, and when having answers to them, we can see if we need to create our own rules, or if we can use something created elsewhere.


Regarding what Martin wrote, the question whether the server or the user should define equality, I must say that it must be the same. A user initiates a query to a DNS server, and the server is going to return records depending on equality definitions in the server. Compared with a more "normal" white-pages query in a database, the user does not do a post-processing of any kind. I.e. getting false-positives back from a server is not an option in DNS -- but one can use that method extremely effective when doing white-pages services. I.e. it is better to return some extra records than risking missing one. In the case of DNS, we need to define equality in such a way that we get back the same result all the time. From all servers.

paf