From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Tue Apr 16 2002 - 07:01:47 CDT
In <3CBAEFB2.7178E19B@ehsco.com> "Eric A. Hall" <ehall@ehsco.com> writes:
>Charles Lindsey wrote:
>>
>> In <3CB708F2.533B199@ehsco.com> "Eric A. Hall" <ehall@ehsco.com> writes:
>> >Punycode encoding will probably be used if the email people decide
>> >that they want to add support for i18n mailbox names.
>>
>> Can you point me at that "Punycode encoding".
>http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-01.txt
Wow! What a complicated bag of tricks!
But its biggest problem seems to be the use of '-' as a delimiter, which
is bad news for us since we are interested in newsgroups names that have
been converted by changing each '.' into a '-'. I presume they chose '-'
because it is not supposed to occur in a domain name.
So if punycode were to be used for the local-part of an addr-spec we might
see
uk-net-news-announce@moderators.isc.org
Now it has to decide whether 'uk-net-news-announce' stands for itself, or
is a punycoding of some dreadful Unicode stuff. Which it actually is,
because of those '-'s. So the mailer will pass it through a punydecoder to
give
u+0080
u+0083
u+0082
u+0085
u+0081
u+0075
u+0085
u+006B
u+0084
u+002D
u+006E
u+0085
u+0065
u+0074
u+002D
u+006E
u+0065
u+0077
u+0073
which I cannot even print out for you, because it contains too many
control charaters :-( .
OTOH, if you give it 'dk-test-utf8--1fbo20a', it gives
u+0064
u+006B
u+002D
u+0074
u+0065
u+0073
u+0074
u+002D
u+0075
u+0074
u+0066
u+0038
u+002D
u+00E6
u+00F8
u+00E5
which is indeed the correct Unicode for 'dk-test-utf8-זרו'.
-- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5