From owner-ietf-imaa Sun Feb 9 11:17:18 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h19JHIq24693 for ietf-imaa-bks; Sun, 9 Feb 2003 11:17:18 -0800 (PST) Received: from [63.202.92.156] (adsl-63-202-92-156.dsl.snfc21.pacbell.net [63.202.92.156]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h19JHGd24687 for ; Sun, 9 Feb 2003 11:17:16 -0800 (PST) Mime-Version: 1.0 X-Sender: phoffman@mail.imc.org Message-Id: X-Habeas-SWE-1: winter into spring X-Habeas-SWE-2: brightly anticipated X-Habeas-SWE-3: like Habeas SWE (tm) X-Habeas-SWE-4: Copyright 2002 Habeas (tm) X-Habeas-SWE-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-SWE-6: email in exchange for a license for this Habeas X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this X-Habeas-SWE-9: mark in spam to . Date: Sun, 9 Feb 2003 11:10:29 -0800 To: ietf-imaa@imc.org From: Paul Hoffman / IMC Subject: Dealing with open issues in IMAA Content-Type: text/plain; charset="iso-8859-1" ; format="flowed" Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Greetings. The list is now open. Adam Costello and I came up with draft-hoffman-imaa (with some early help from Patrik Fältström), and we think it is a good way forward for internationalizing Internet email addresses. Those of you who have read the draft know that there are a bunch of open issues. In fact, there are a few where Adam and I strongly disagree with each other. For the first round of discussion, people who have strong opinions on any of the open issues should speak up, starting a new thread with an appropriate subject line (if one isn't happening already). After we get a sense of what people think, we'll revise the document, possibly closing off some issues. In the latter case, we'll start an appendix of "design choices" so that people can see how we got to where we end up. --Paul Hoffman, Director --Internet Mail Consortium From owner-ietf-imaa Sun Feb 9 11:17:19 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h19JHJH24696 for ietf-imaa-bks; Sun, 9 Feb 2003 11:17:19 -0800 (PST) Received: from [63.202.92.156] (adsl-63-202-92-156.dsl.snfc21.pacbell.net [63.202.92.156]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h19JHHd24691 for ; Sun, 9 Feb 2003 11:17:17 -0800 (PST) Mime-Version: 1.0 X-Sender: phoffman@mail.imc.org Message-Id: X-Habeas-SWE-1: winter into spring X-Habeas-SWE-2: brightly anticipated X-Habeas-SWE-3: like Habeas SWE (tm) X-Habeas-SWE-4: Copyright 2002 Habeas (tm) X-Habeas-SWE-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-SWE-6: email in exchange for a license for this Habeas X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this X-Habeas-SWE-9: mark in spam to . Date: Sun, 9 Feb 2003 11:16:50 -0800 To: ietf-imaa@imc.org From: Paul Hoffman / IMC Subject: Case sensitivity on the LHS Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: This one should be lively. RFC 2821 and RFC 2822 make it clear that the left-hand side (LHS) of email addresses are opaque, which in turn means they are case-sensitive. The -00 draft of IMAA preserves this. Most email users don't know that the LHS is case-sensitive, and probably guess that it isn't because the RHS (the domain name) is not. Further, there are some mail systems and gateways that do case conversion on the LHS. If we simplify IMAA to make the LHS case-insensitive, it will probably match the expectations of users better. It would also mean that we could reuse Nameprep instead of using our own Stringprep profile. There are other reasons why this might be good listed in the IMAA document. But to do so would go against the spirit of the standards on which IMAA rests, namely 2821 and 2822 (and 821 and 822 before them). Purity or modernity? --Paul Hoffman, Director --Internet Mail Consortium From owner-ietf-imaa Sun Feb 9 11:28:25 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h19JSPO24870 for ietf-imaa-bks; Sun, 9 Feb 2003 11:28:25 -0800 (PST) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by above.proper.com (8.11.6/8.11.3) with SMTP id h19JSPd24866 for ; Sun, 9 Feb 2003 11:28:25 -0800 (PST) Received: (qmail 80940 invoked by uid 1016); 9 Feb 2003 19:28:51 -0000 Date: 9 Feb 2003 19:28:51 -0000 Message-ID: <20030209192851.80939.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: ietf-imaa@imc.org Subject: Background reading for non-ASCII mailbox names Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: My web page http://cr.yp.to/djbdns/idn.html discusses six problems caused by careless internationalization proposals: * interoperability failures; * inconsistent displays; * unnecessary implementation and deployment costs; * multiple semantically similar names; * identical displays of different names; and * typing failures. The discussion focuses on domain names for concreteness, but the same principles apply to mailbox names, login names, etc. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago P.S. The mailing-list software silently discarded this message the first time I sent it. From owner-ietf-imaa Sun Feb 9 14:47:03 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h19Ml3Y29944 for ietf-imaa-bks; Sun, 9 Feb 2003 14:47:03 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h19Mkwd29938; Sun, 9 Feb 2003 14:47:01 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id RAA16869; Sun, 9 Feb 2003 17:47:00 -0500 Message-Id: <4.2.0.58.J.20030209173037.05a45ca0@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Sun, 09 Feb 2003 17:43:27 -0500 To: Paul Hoffman / IMC , ietf-imaa@imc.org From: Martin Duerst Subject: Re: Case sensitivity on the LHS In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: At 11:16 03/02/09 -0800, Paul Hoffman / IMC wrote: >If we simplify IMAA to make the LHS case-insensitive, it will probably >match the expectations of users better. It would also mean that we could >reuse Nameprep instead of using our own Stringprep profile. There are >other reasons why this might be good listed in the IMAA document. But to >do so would go against the spirit of the standards on which IMAA rests, >namely 2821 and 2822 (and 821 and 822 before them). Would it just go against the spirit, or also create other problems? The following problems come to my mind: [mostly just thinking aloud] - Some systems currently treat ASCII as case-sensitive (don't have any idea how many). But these names would not be encoded, so the behavior would stay the same. (except if nameprep is applied before the check for ascii-only is done, which may well be the case). - What is the current user expectation? My guess is that case-insensitive is more widespread. In any case, one or the other expectation will be disappointed (if they ever happen to notice). Do we have any idea which systems are more numerous (the only sample I have at the moment is my own email address, which is case-insensitive). Overall, I think that the whole nameprep/stringprep stuff is already complicated enough, and so if there are not major problems, going with case-insensitive looks much better to me. Regards, Martin. From owner-ietf-imaa Sun Feb 9 21:57:06 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1A5v6f08711 for ietf-imaa-bks; Sun, 9 Feb 2003 21:57:06 -0800 (PST) Received: from mercury.ccil.org (mail@[192.190.237.100]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1A5v5d08707 for ; Sun, 9 Feb 2003 21:57:05 -0800 (PST) Received: from cowan by mercury.ccil.org with local (Exim 3.35 #1 (Debian)) id 18i6wC-0007YE-00 for ; Mon, 10 Feb 2003 00:57:04 -0500 Subject: John Cowan on IMAA draft To: ietf-imaa@imc.org Date: Mon, 10 Feb 2003 00:57:04 -0500 (EST) X-Mailer: ELM [version 2.4ME+ PL66 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: From: John Cowan Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Since local names are under the sole control of a single domain (even if people nominate their own local names, it's the domain mail administrator who decides whether they work), I think that having an ACE prefix is not necessary. There is no requirement that every possible name be available. I think the de facto situation that local names are case-insensitive should be accepted. Doing local names by parts (delimited by non-alphanumeric ASCII characters) is a good idea. However, I'm not wedded to it. We should go for 63-character limitation. Recognizing fullwidth @ is important, because it's context dependent whether people are using halfwidth or fullwidth characters, and they may not even be conscious of it in double-width environments. -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_ ----- End of forwarded message (env-from cowan) ----- -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_ From owner-ietf-imaa Mon Feb 10 05:04:39 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AD4dr22966 for ietf-imaa-bks; Mon, 10 Feb 2003 05:04:39 -0800 (PST) Received: from crow.verisign.com (crow.verisign.com [216.168.237.103]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1AD4bd22957; Mon, 10 Feb 2003 05:04:37 -0800 (PST) Received: from vsvapostalgw3.prod.netsol.com (vsvapostalgw3.prod.netsol.com [10.170.12.61]) by crow.verisign.com (nsi_0.1/8.9.1) with ESMTP id IAA04734; Mon, 10 Feb 2003 08:04:32 -0500 (EST) Received: by vsvapostalgw3.prod.netsol.com with Internet Mail Service (5.5.2653.19) id <1SMTVFFZ>; Mon, 10 Feb 2003 08:02:33 -0500 Message-ID: <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> From: "Hollenbeck, Scott" To: "'Paul Hoffman / IMC'" , ietf-imaa@imc.org Subject: RE: Case sensitivity on the LHS Date: Mon, 10 Feb 2003 08:00:34 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: > If we simplify IMAA to make the LHS case-insensitive, it will > probably match the expectations of users better. It would also mean > that we could reuse Nameprep instead of using our own Stringprep > profile. There are other reasons why this might be good listed in the > IMAA document. But to do so would go against the spirit of the > standards on which IMAA rests, namely 2821 and 2822 (and 821 and 822 > before them). > > Purity or modernity? Modernity. I agree that case insensitivity will probably match the expectations of users better. Is this document intended to be a formal update to 2821 and 2822? Both (2821 section 4.1.2 and 2822 section 3.4.1) contain formal definitions of the local part of an email address. -Scott- From owner-ietf-imaa Mon Feb 10 06:08:16 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AE8Gv26517 for ietf-imaa-bks; Mon, 10 Feb 2003 06:08:16 -0800 (PST) Received: from server1.matic.com (server.iicinternet.com [66.159.16.71] (may be forged)) by above.proper.com (8.11.6/8.11.3) with SMTP id h1AE8Ed26510 for ; Mon, 10 Feb 2003 06:08:14 -0800 (PST) Received: (qmail 12207 invoked from network); 10 Feb 2003 14:07:57 -0000 Received: from adsl-65-42-242-53.dsl.lgnnmi.ameritech.net (HELO ?192.168.0.100?) (65.42.242.53) by server.iicinternet.com with SMTP; 10 Feb 2003 14:07:57 -0000 Mime-Version: 1.0 X-Sender: tedd@sperling.com (Unverified) Message-Id: Date: Mon, 10 Feb 2003 09:07:37 -0500 To: ietf-imaa@imc.org From: tedd Subject: Re: Case sensitivity on the LHS Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Paul: >This one should be lively. Why not? Everything else is. >Most email users don't know that the LHS is case-sensitive, and >probably guess that it isn't because the RHS (the domain name) is >not. Further, there are some mail systems and gateways that do case >conversion on the LHS. > >If we simplify IMAA to make the LHS case-insensitive, it will >probably match the expectations of users better. Absolutely. >It would also mean that we could reuse Nameprep instead of using our >own Stringprep profile. There are other reasons why this might be >good listed in the IMAA document. But to do so would go against the >spirit of the standards on which IMAA rests, namely 2821 and 2822 >(and 821 and 822 before them). > >Purity or modernity? > >--Paul Hoffman, Director >--Internet Mail Consortium I am open to arguments otherwise, but at present, my vote would be to make the LHS case-insensitive. In fact, I don't understand the reasoning behind considering case-sensitive in the first place. Would someone be so kind as to point out the benefit(s) of having an email address of Tedd@sperling.com being different than tedd@sperling.com? To me, it just doesn't make any sense -- or do I not understand the problem. Thank you. tedd -- http://sperling.com/ From owner-ietf-imaa Mon Feb 10 06:21:07 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AEL7r28484 for ietf-imaa-bks; Mon, 10 Feb 2003 06:21:07 -0800 (PST) Received: from fluff.x42.com (xp8rji20lb1dl3ntueen@fluff.x42.com [213.187.218.11]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1AEL5d28477 for ; Mon, 10 Feb 2003 06:21:05 -0800 (PST) Received: (qmail 28505 invoked by uid 569); 10 Feb 2003 14:21:05 -0000 Date: Mon, 10 Feb 2003 15:21:05 +0100 From: Magnus Bodin To: tedd Cc: ietf-imaa@imc.org Subject: Re: Case sensitivity on the LHS Message-ID: <20030210142105.GG12186@bodin.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.4i X-Face: "/J{klxZ0}6u#[u&\4L/KMmGO}7(W|&yk4c(NYO^IPyMT<3DMOn7\?+Bw?33T ,}nX(Pj6}j;X1LPn$%d<;in~z50w#P>3u6)|bgwm~ZB@Hl?Y|BTa*/vH!~}Iln6F>>3: s/'5[>fW7gYB$B.m=85bu$GTPN#NG##a_^mc9uBp9.gvh*i>fHyB: Reply-By: Thu Feb 13 15:18:45 2003 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Le Mon, Feb 10, 2003 at 09:07:37AM -0500, tedd écrivait: > > Would someone be so kind as to point out the benefit(s) of having > an email address of Tedd@sperling.com being different than > tedd@sperling.com? To me, it just doesn't make any sense -- or do I > not understand the problem. It might not make any sense in English with ASCII [A-Za-z]. In a different language with a different pair of upper/lower-case characters, it might be a bigger difference between a upper/lower/mixed-case word. /magnus -- http://x42.com From owner-ietf-imaa Mon Feb 10 06:33:58 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AEXwU29596 for ietf-imaa-bks; Mon, 10 Feb 2003 06:33:58 -0800 (PST) Received: from mercury.ccil.org (mail@[192.190.237.100]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1AEXvd29591 for ; Mon, 10 Feb 2003 06:33:57 -0800 (PST) Received: from cowan by mercury.ccil.org with local (Exim 3.35 #1 (Debian)) id 18iF0P-000303-00 for ; Mon, 10 Feb 2003 09:33:57 -0500 Subject: Re: Case sensitivity on the LHS To: ietf-imaa@imc.org Date: Mon, 10 Feb 2003 09:33:57 -0500 (EST) X-Mailer: ELM [version 2.4ME+ PL66 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: From: John Cowan Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: tedd scripsit: > Would > someone be so kind as to point out the benefit(s) of having an email > address of Tedd@sperling.com being different than tedd@sperling.com? The classic minimal pair is Tedd@example.com vs. TedD@example.com. But I agree that it's a silly distinction. Mail admins don't *have* to let people choose the absolutely-precisely-preferred forms of their name as local-parts. -- LEAR: Dost thou call me fool, boy? John Cowan FOOL: All thy other titles http://www.ccil.org/~cowan thou hast given away: jcowan@reutershealth.com That thou wast born with. http://www.reutershealth.com From owner-ietf-imaa Mon Feb 10 06:36:50 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AEaof00539 for ietf-imaa-bks; Mon, 10 Feb 2003 06:36:50 -0800 (PST) Received: from relay-3m.club-internet.fr (relay-3m.club-internet.fr [194.158.104.42]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1AEamd00527; Mon, 10 Feb 2003 06:36:48 -0800 (PST) Received: from mine.club-internet.fr (f11v-10-143.d1.club-internet.fr [213.44.169.143]) by relay-3m.club-internet.fr (Postfix) with ESMTP id 7E9A2E33C; Mon, 10 Feb 2003 15:37:32 +0100 (CET) Message-Id: <5.2.0.9.0.20030210144156.023fbec0@mail.club-internet.fr> X-Sender: jefsey@mail.club-internet.fr X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Mon, 10 Feb 2003 14:45:44 +0100 To: Martin Duerst , Paul Hoffman / IMC , ietf-imaa@imc.org From: "J-F C. (Jefsey) Morfin" Subject: Re: Case sensitivity on the LHS In-Reply-To: <4.2.0.58.J.20030209173037.05a45ca0@localhost> References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: At 23:43 09/02/03, Martin Duerst wrote: >Overall, I think that the whole nameprep/stringprep stuff is already >complicated enough, and so if there are not major problems, going with >case-insensitive looks much better to me. agreed. we also have to consider all the possible devices (exsiting or to come) having to send mails with reduced keyboards and to support IDNs with reduced computing resources. From owner-ietf-imaa Mon Feb 10 06:59:39 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AExdq01122 for ietf-imaa-bks; Mon, 10 Feb 2003 06:59:39 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1AExad01114; Mon, 10 Feb 2003 06:59:36 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id JAA24238; Mon, 10 Feb 2003 09:59:35 -0500 Message-Id: <4.2.0.58.J.20030210094623.05b46498@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Mon, 10 Feb 2003 09:48:25 -0500 To: "J-F C. (Jefsey) Morfin" , Paul Hoffman / IMC , ietf-imaa@imc.org From: Martin Duerst Subject: Re: Case sensitivity on the LHS In-Reply-To: <5.2.0.9.0.20030210144156.023fbec0@mail.club-internet.fr> References: <4.2.0.58.J.20030209173037.05a45ca0@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: At 14:45 03/02/10 +0100, J-F C. (Jefsey) Morfin wrote: >we also have to consider all the possible devices (exsiting or to come) >having to send mails with reduced keyboards and to support IDNs with >reduced computing resources. Yes. Please note that nameprep/stringprep are can be reduced (in some cases drastically) if you know that you only will get a subset of characters as input. But even then, being able to use the same nameprep/stringprep for both sides of the '@' is a clear win. Regards, Martin. From owner-ietf-imaa Mon Feb 10 07:03:08 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AF38101235 for ietf-imaa-bks; Mon, 10 Feb 2003 07:03:08 -0800 (PST) Received: from mailgen2.internet.gouv.qc.ca (courrier4.internet.gouv.qc.ca [192.197.162.9] (may be forged)) by above.proper.com (8.11.6/8.11.3) with SMTP id h1AF37d01231 for ; Mon, 10 Feb 2003 07:03:07 -0800 (PST) Received: (qmail 3585 invoked from network); 10 Feb 2003 15:02:54 -0000 Received: from unknown (HELO p295.sct1.gouv.qc.ca) (142.213.85.104) by mailgen2.internet.gouv.qc.ca with SMTP; 10 Feb 2003 15:02:54 -0000 Message-Id: <5.0.2.1.2.20030210095250.00b03c68@entree.sct1.gouv.qc.ca> X-Sender: alabonte@entree.sct1.gouv.qc.ca X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Mon, 10 Feb 2003 10:02:57 -0500 To: Magnus Bodin , tedd From: =?iso-8859-1?Q?Alain_LaBont=E9?= Subject: Re: Case sensitivity on the LHS Cc: ietf-imaa@imc.org In-Reply-To: <20030210142105.GG12186@bodin.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: A 15:21 2003-02-10 +0100, Magnus Bodin a écrit : >Le Mon, Feb 10, 2003 at 09:07:37AM -0500, tedd écrivait: > > > > Would someone be so kind as to point out the benefit(s) of having > > an email address of Tedd@sperling.com being different than > > tedd@sperling.com? To me, it just doesn't make any sense -- or do I > > not understand the problem. > >[Magnus] It might not make any sense in English with ASCII [A-Za-z]. In a >different language with a different pair of upper/lower-case characters, >it might be a bigger difference between a upper/lower/mixed-case word. [Alain] You mean in German for non-proprer names? (in French and English there are also cases where sensitivity matters, but it is rather rare, and should not be a rule[*], but I could agree that we have to consider the German issue -- are there other languages like German?). As this is relevant, are there cases with proper names where upper and lower case will change anything ? Alain LaBonté (no difference with ALAIN LABONTÉ, except that I would then write LA BONTÉ in two words, although I would not either make a special case with this two-word issue, a side issue) Québec *: French: "J'aime le Français" means "I love the Frenchman" "J'aime le français" means "I love French" English: "This month is august" does not mean the same as "This month is August" But these are very execptional in both languages... From owner-ietf-imaa Mon Feb 10 07:20:34 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AFKY301656 for ietf-imaa-bks; Mon, 10 Feb 2003 07:20:34 -0800 (PST) Received: from server1.matic.com (server.iicinternet.com [66.159.16.71] (may be forged)) by above.proper.com (8.11.6/8.11.3) with SMTP id h1AFKVd01651 for ; Mon, 10 Feb 2003 07:20:31 -0800 (PST) Received: (qmail 16440 invoked from network); 10 Feb 2003 15:20:14 -0000 Received: from adsl-65-42-242-53.dsl.lgnnmi.ameritech.net (HELO ?192.168.0.100?) (65.42.242.53) by server.iicinternet.com with SMTP; 10 Feb 2003 15:20:14 -0000 Mime-Version: 1.0 X-Sender: tedd@sperling.com (Unverified) Message-Id: In-Reply-To: <20030210142105.GG12186@bodin.org> References: <20030210142105.GG12186@bodin.org> Date: Mon, 10 Feb 2003 10:19:51 -0500 To: ietf-imaa@imc.org From: tedd Subject: Re: Case sensitivity on the LHS Content-Type: text/plain; charset="iso-8859-1" ; format="flowed" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by above.proper.com id h1AFKXd01653 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: >Le Mon, Feb 10, 2003 at 09:07:37AM -0500, tedd écrivait: >> >> Would someone be so kind as to point out the benefit(s) of having >> an email address of Tedd@sperling.com being different than >> tedd@sperling.com? To me, it just doesn't make any sense -- or do I >> not understand the problem. > >It might not make any sense in English with ASCII [A-Za-z]. In a >different language with a different pair of upper/lower-case characters, >it might be a bigger difference between a upper/lower/mixed-case word. > >/magnus /magnus: Thank you -- very interesting. But, the upper/lower-case character problem you describe will still exist on the RHS regardless -- and that is not going to change; Thus, making the LHS case-sensitive would only compound the problem described as I see it. I believe that most users will not understand why the case would be sensitive for one side and not for the other -- most don't realize that now -- and as Paul pointed out. I think a considerable amount of user frustration, confusion and error would enter into the mix if the rules were different for each side of the "@" -- not to mention the problems that may arise from the implementation of two different sets of rules to servers, mail admins and such. Thus, I believe that whatever method is adapted for character consideration should be consistent throughout the address. tedd -- http://sperling.com/ From owner-ietf-imaa Mon Feb 10 07:31:40 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AFVeT01928 for ietf-imaa-bks; Mon, 10 Feb 2003 07:31:40 -0800 (PST) Received: from server1.matic.com (server.iicinternet.com [66.159.16.71] (may be forged)) by above.proper.com (8.11.6/8.11.3) with SMTP id h1AFVbd01922 for ; Mon, 10 Feb 2003 07:31:37 -0800 (PST) Received: (qmail 17131 invoked from network); 10 Feb 2003 15:31:19 -0000 Received: from adsl-65-42-242-53.dsl.lgnnmi.ameritech.net (HELO ?192.168.0.100?) (65.42.242.53) by server.iicinternet.com with SMTP; 10 Feb 2003 15:31:19 -0000 Mime-Version: 1.0 X-Sender: tedd@sperling.com (Unverified) Message-Id: In-Reply-To: References: Date: Mon, 10 Feb 2003 10:30:57 -0500 To: ietf-imaa@imc.org From: tedd Subject: Re: Case sensitivity on the LHS Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: >tedd scripsit: > >> Would >> someone be so kind as to point out the benefit(s) of having an email >> address of Tedd@sperling.com being different than tedd@sperling.com? > >The classic minimal pair is Tedd@example.com vs. TedD@example.com. >But I agree that it's a silly distinction. Mail admins don't *have* >to let people choose the absolutely-precisely-preferred forms of their >name as local-parts. > >-- >LEAR: Dost thou call me fool, boy? John Cowan John: Agreed. And furthermore, I believe that: (lower-case)omega@(lower-case)omega.com is better than allowing -- (upper-case)omega@(lower-case)omega.com -- and trying to explain, implement, and having people understand why the upper-case omega character (code point) is not allowed on both sides of the "@". tedd -- http://sperling.com/ From owner-ietf-imaa Mon Feb 10 07:32:58 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AFWw601967 for ietf-imaa-bks; Mon, 10 Feb 2003 07:32:58 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1AFWvd01962 for ; Mon, 10 Feb 2003 07:32:57 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id KAA04483; Mon, 10 Feb 2003 10:32:23 -0500 Message-Id: <4.2.0.58.J.20030210094840.059796b8@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Mon, 10 Feb 2003 10:32:18 -0500 To: Magnus Bodin , tedd From: Martin Duerst Subject: Re: Case sensitivity on the LHS Cc: ietf-imaa@imc.org In-Reply-To: <20030210142105.GG12186@bodin.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: At 15:21 03/02/10 +0100, Magnus Bodin wrote: >It might not make any sense in English with ASCII [A-Za-z]. In a >different language with a different pair of upper/lower-case characters, >it might be a bigger difference between a upper/lower/mixed-case word. This is a point to consider. However, the differences between upper/lower/mixed-case words usually apply to the actual language (e.g. nouns vs. verbs,...), not to names. This is certainly the case in German. In various European languages, there are individual differences of how to spell names with prefixes (e.g. French 'de' or 'du', Dutch 'van', German 'von', ...). German is not special in this respect. There are not only casing variants, but also whether there is a space or not (e.g. 'du Bois' vs. 'Du Bois' vs. 'duBois' vs. 'DuBois' vs. 'Dubois', not all of them necessarily in use). We kind of know that we cannot deal with the space. So half of the distinctions in this area are already lost, and it becomes impossible to completely and faithfully reflect personal spelling differences to the last detail. In that case, it seems better to just go all the way to case-insensitivity. Regards, Martin. From owner-ietf-imaa Mon Feb 10 09:23:07 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AHN7P09555 for ietf-imaa-bks; Mon, 10 Feb 2003 09:23:07 -0800 (PST) Received: from yxa.extundo.com (178.230.13.217.in-addr.dgcsystems.net [217.13.230.178]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1AHN5d09546; Mon, 10 Feb 2003 09:23:05 -0800 (PST) Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) (authenticated bits=0) by yxa.extundo.com (8.12.7/8.12.7) with ESMTP id h1AHN4NG032108 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Mon, 10 Feb 2003 18:23:05 +0100 To: Paul Hoffman / IMC Cc: ietf-imaa@imc.org Subject: Re: Case sensitivity on the LHS X-Payment: hashcash 1.1 0:030210:phoffman@imc.org:a1921e04f1a69add X-Hashcash: 0:030210:phoffman@imc.org:a1921e04f1a69add X-Payment: hashcash 1.1 0:030210:ietf-imaa@imc.org:d446b39ab17e9259 X-Hashcash: 0:030210:ietf-imaa@imc.org:d446b39ab17e9259 From: Simon Josefsson Date: Mon, 10 Feb 2003 18:23:03 +0100 In-Reply-To: (Paul Hoffman / IMC's message of "Sun, 9 Feb 2003 11:16:50 -0800") Message-ID: User-Agent: Gnus/5.090015 (Oort Gnus v0.15) Emacs/21.3.50 (i686-pc-linux-gnu) References: X-Face: %bo>yc#X1.-jVa- List-Unsubscribe: List-ID: Paul Hoffman / IMC writes: > This one should be lively. > > RFC 2821 and RFC 2822 make it clear that the left-hand side (LHS) of > email addresses are opaque, which in turn means they are > case-sensitive. The -00 draft of IMAA preserves this. > > Most email users don't know that the LHS is case-sensitive, and > probably guess that it isn't because the RHS (the domain name) is > not. Further, there are some mail systems and gateways that do case > conversion on the LHS. > > If we simplify IMAA to make the LHS case-insensitive, it will probably > match the expectations of users better. It would also mean that we > could reuse Nameprep instead of using our own Stringprep > profile. There are other reasons why this might be good listed in the > IMAA document. But to do so would go against the spirit of the > standards on which IMAA rests, namely 2821 and 2822 (and 821 and 822 > before them). > > Purity or modernity? It depends on which definition of "case-insensitive" you use. If you use NFKC you will collapse many distinct names of humans into the same name, which is a failure as far as LHS is concerned. C.f. ß maps to ss. LHS are often human names which are free text, and I fear NFKC will damage a significant amount of non-western names. NFKC is appropriate for preparing strings for equality comparisons, but can be too aggressive in other situations. Changing the LHS definition in RFC 282{1,2} should IMHO be done based on technical reasons, and I don't see any technical reason presented above. Arguing that users doesn't read the technical specification isn't a good motivation for changing the specification; users will never read the technical specification. Applications are responsible for implementing a non-surprising behavior for clients (which I agree treating LHS as case-insensitive is), and with the current specifications they can, e.g. by searching case insensitively. Unless a technical case can be made for changing the specification, let's move on. From owner-ietf-imaa Mon Feb 10 11:44:17 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AJiHU15917 for ietf-imaa-bks; Mon, 10 Feb 2003 11:44:17 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1AJiEd15909 for ; Mon, 10 Feb 2003 11:44:15 -0800 (PST) Received: (qmail 1494 invoked by uid 66); 10 Feb 2003 19:44:15 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 10 Feb 2003 19:44:15 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-10-2035d); 10 Feb 2003 20:44:09 +0100 Date: 10 Feb 2003 20:43:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8f$3A$+JcDD@3247.org> In-Reply-To: Subject: Re: Case sensitivity on the LHS User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-10-2035d MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: John Cowan schrieb/wrote: > The classic minimal pair is Tedd@example.com vs. TedD@example.com. > But I agree that it's a silly distinction. Mail admins don't *have* > to let people choose the absolutely-precisely-preferred forms of their > name as local-parts. This example shows, however, that *preserving* case can be important. “Tedd†and “TedD†have different meanings. This, of course, is even more important wrt languages that have a case mapping not identical to that of Unicode. The typical example is “Iâ€â†”“i†vs. “İâ€â†”“i†and “Iâ€â†”“ıâ€. Mapping the address “I.Surname@example.com†to “i.surname@example.com†might be *just* *wrong*. Claus (NB: The important non-ASCII characters above are the Capital Latin Letter I with Dot and the small Latin Letter Dottless i.) -- ------------------------ http://www.faerber.muc.de/ ------------------------ OpenPGP: DSS 1024/639680F0 E7A8 AADB 6C8A 2450 67EA AF68 48A5 0E63 6396 80F0 From owner-ietf-imaa Mon Feb 10 11:44:16 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1AJiGC15916 for ietf-imaa-bks; Mon, 10 Feb 2003 11:44:16 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1AJiEd15908 for ; Mon, 10 Feb 2003 11:44:14 -0800 (PST) Received: (qmail 1493 invoked by uid 66); 10 Feb 2003 19:44:15 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 10 Feb 2003 19:44:14 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-10-2035d); 10 Feb 2003 20:44:09 +0100 Date: 10 Feb 2003 20:42:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8f$3A5e3cDD@3247.org> Subject: Compatibility with IDNA User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-10-2035d MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Hallo, I'd like to explain some of the design decisions I have made for draft- faerber-i18n-email-netnews-names-00.txt The basic idea is that IDNA, IMAA and similar internationalised identifiers (e.g. newsgroup names) should be able to use the very same encoding method. This has several advantages: . You can put a local-part, a domain, or a complete email address into the same encoding/decoding function and the results are correct. This makes implementations much easier. (Note: This need not be true for email addresses in non-canonical form, i.e. with anything not allowed in RFC 2821.) . You can have a domain name embedded in a local-part and it is encoded the same way as a domain on the right hand side if it is delimited by one of the delimiters listed above (useful for the so-called percent hack, MIXER, etc.) . The reverse is also true: You can have an email address converted to a domain name (as seen in SOA DNS records, for example). The design decisions that have to be made to make this work as expected are these: . Do a NFCK normalisation at the very beginning (needed for delimiters, I've missed that in my draft). . Don't encode the local-part as a whole, but use as much delimiters as possible to split it into pieces. (NB: We have to be very intelligent wrt quoted-strings here.) . Use the mixed-case annotation. (Yes, it has to be formalised then.) . Use the same ACE prefix as IDNA. It should be noted that this differs in some important aspects from IDNA: . IDNA does the normalisation later. . IDNA only recognises the dot (in four variants, two in NFKC) as a seperator. . IDNA maps everyting to lower-case. . IDNA uses ``UseSTD13ASCIIRules''. . IDNA has a strong length limit. But these differences don't have an impact on the output for all valid domain names (or, in the case of the mixed-case annotation, produce an equivalent result). Claus From owner-ietf-imaa Mon Feb 10 16:42:05 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1B0g5P24401 for ietf-imaa-bks; Mon, 10 Feb 2003 16:42:05 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1B0g4d24397 for ; Mon, 10 Feb 2003 16:42:04 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18iOUy-0004Nd-00 for ; Mon, 10 Feb 2003 16:42:08 -0800 Date: Mon, 10 Feb 2003 11:44:16 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: John Cowan on IMAA draft Message-ID: <20030210114416.GB9872@nicemice.net> Reply-To: IETF IMAA list References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: John Cowan wrote: > Since local names are under the sole control of a single domain > (even if people nominate their own local names, it's the domain mail > administrator who decides whether they work), I think that having an > ACE prefix is not necessary. An ACE prefix (or suffix, or infix, etc) is necessary so that applications know whether to convert ASCII local-parts to non-ASCII for display. How else will my mail program know how to display the From address of incoming mail? AMC From owner-ietf-imaa Mon Feb 10 16:41:57 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1B0fva24387 for ietf-imaa-bks; Mon, 10 Feb 2003 16:41:57 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1B0fsd24381 for ; Mon, 10 Feb 2003 16:41:56 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18iOUn-0004NR-00 for ; Mon, 10 Feb 2003 16:41:57 -0800 Date: Mon, 10 Feb 2003 11:40:16 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: Case sensitivity on the LHS Message-ID: <20030210114016.GA9872@nicemice.net> Reply-To: IETF IMAA list References: <4.2.0.58.J.20030209173037.05a45ca0@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4.2.0.58.J.20030209173037.05a45ca0@localhost> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Martin Duerst wrote: > - Some systems currently treat ASCII as case-sensitive (don't have > any idea how many). But these names would not be encoded, so the > behavior would stay the same. (except if nameprep is applied before > the check for ascii-only is done, which may well be the case). In the imaa-00 draft, Stringprep is applied only to non-ASCII strings, same as in IDNA. ToASCII and ToUnicode are designed that way so that IDNA/IMAA will have no impact on the handling of traditional ASCII labels/local-parts. One gotcha (the only one, I think) is non-ASCII representations of ASCII strings. For example, suppose that foo and FOO are local-parts referring to two distinct mailboxes. Today, you can type foo to send mail to one mailbox, and you can type FOO to send mail to the other mailbox, but if you type FOO using fullwidth characters, the mail will probably go nowhere. But if the sender's user agent is IMAA-aware, it will perform ToASCII on the fullwidth FOO, resulting in either ASCII FOO (if case-folding is not done) or ASCII foo (if case-folding is done). So having case-insensitive internationalized local-parts (which entails case-folding) will cause counter-intuitive results when a user tries to send mail to a case-sensitive ASCII local-part by typing a string containing both uppercase characters and non-ASCII characters. The problem occurs only when all three atypical circumstances coincide (case-sensitive ASCII local parts, users typing uppercase characters in email addresses, and users typing non-ASCII characters to represent an ASCII string), and case-sensitive local-parts are already counter-intuitive anyway, so this pitfall is probably not worth worrying about. > Do we have any idea which systems are more numerous (the only > sample I have at the moment is my own email address, which is > case-insensitive). I have never encountered a case-sensitive local-part. Has anyone here ever encountered a case-sensitive local-part? I don't mean to argue that local-parts are de facto case-insensitive, despite the standards. In fact it irks me whenever I see applications convert local-parts to all-caps or all-lowercase in defiance of the standards. The fact is that ASCII local-parts "may be case-sensitive", and so they must be treated as if they are case-sensitive, even if they almost always aren't. But for non-ASCII local-parts, "may be case-sensitive" is not an option, they either must be case-sensitive or must be case-insensitive, because it's the sender that decides, not the recipient. So we're doomed to depart from tradition(*), we just have to pick a direction. AMC (*) Unless we mandate mixed-case annotations, which I don't see happening. From owner-ietf-imaa Mon Feb 10 17:44:10 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1B1iAQ25996 for ietf-imaa-bks; Mon, 10 Feb 2003 17:44:10 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1B1i9d25992 for ; Mon, 10 Feb 2003 17:44:09 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18iPT3-0004ct-00 for ; Mon, 10 Feb 2003 17:44:13 -0800 Date: Tue, 11 Feb 2003 01:44:13 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: Case sensitivity on the LHS Message-ID: <20030211014413.GD16359@nicemice.net> Reply-To: IETF IMAA list References: <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: "Hollenbeck, Scott" wrote: > Is this document intended to be a formal update to 2821 and 2822? Is IDNA a formal update to RFCs 1034 and 1035? IMAA would bear a similar relation to RFCs 821, 822, 2821, 2822. > Both (2821 section 4.1.2 and 2822 section 3.4.1) contain formal > definitions of the local part of an email address. IMAA does not redefine local part, but instead defines a new term, internationalized local part. Just as IDNA does not redefine domain label, but instead defines a new term, internationalized domain label. tedd wrote: > (lower-case)omega@(lower-case)omega.com > > is better than allowing -- > > (upper-case)omega@(lower-case)omega.com > > -- and trying to explain, implement, and having people understand why > the upper-case omega character (code point) is not allowed on both > sides of the "@". Actually, uppercase OMEGA *is* allowed on both sides of the @, in the sense that a user can type uppercase OMEGA on both sides, and IMAA and IDNA will both accept it. The question is whether mail sent to OMEGA@OMEGA.com will reach the same mailbox as mail sent to omega@omega.com. We know that omega vs. OMEGA makes no difference in the domain part; they are the same domain. The question is whether OMEGA and omega in the local part should refer to the same mailbox, or to two different mailboxes. For ASCII local parts, the answer is unknown, and can vary from one mail server to another. Applications preserve the case, and the destination mail server decides whether to perform a case-sensitive comparison or a case-insensitive comparison against its database of mailbox names. But for non-ASCII local parts, we can't defer the decision to the destination mail server, because the applications need to know whether to perform case-folding or not. So we need to pick one rule, case-sensitive or case-insensitive, for everyone. [Unless we mandate the use of mixed-case annotation. But that would mean mandating a significant amount of additional complexity in all IMAA applications, where the only benefit from the additional complexity is the ability of each mail server to decide whether its local parts are case-sensitive or case-insensitive. But in practice, virtually no one is interested in creating case-sensitive local parts. So mandating mixed-case annotations would have a significant cost and no significant practical benefit. The IDNA model appears to be a better tradeoff: mandate that non-ASCII local parts are always case-insensitive, and leave mixed-case annotations as an *optional* technique for preserving case. Applications that care about preserving case can choose to spend the extra implementation effort.] Claus Färber wrote: > This example shows, however, that *preserving* case can be important. > This, of course, is even more important wrt languages that have a > case mapping not identical to that of Unicode. The typical example is > [Turkish i] Is this a "typical" example, or the *only* example? Turkish i is the only locale-dependent aspect of the Unicode case-folding operation, according to the Unicode case-folding table. I suppose it's possible that the table overlooks some other locale-dependent things that ought to be in there, but can anyone name any examples besides Turkish i? Simon Josefsson wrote: > If you use NFKC you will collapse many distinct names of humans into > the same name, which is a failure as far as LHS is concerned. C.f. ß > maps to ss. NFKC does not map German sharp s to ss. Neither does NFC. It is the Unicode case-folding operation that maps German sharp s to ss. It has nothing to do with NFKC. You claim that "many distinct names" of humans will get collapsed by NFKC into the same name. So far you have provided zero examples. Could you supply a few more please? > NFKC is appropriate for preparing strings for equality comparisons, Which is exactly why we use it. IDNA and IMAA are designed to allow the equality comparisons to be performed by legacy software that doesn't know about Stringprep. That means the preparation has to be done by the IDNA/IMAA-aware applications before the strings are inserted into old protocols and transfered to old servers. > Changing the LHS definition in RFC 282{1,2} should IMHO be done based > on technical reasons, and I don't see any technical reason presented > above. Arguing that users doesn't read the technical specification > isn't a good motivation for changing the specification; users will > never read the technical specification. Applications are responsible > for implementing a non-surprising behavior for clients (which I > agree treating LHS as case-insensitive is), and with the current > specifications they can, e.g. by searching case insensitively. Here is the technical argument: The day that IMAA is adopted, I should be able to create an ACE username on yahoo.com, and people should be able to send mail to that account by typing the corresponding non-ASCII username into their IMAA-aware mail user agent. The mail should reach me even if yahoo.com is completely IMAA-unaware. Now suppose the sender types that non-ASCII username slightly differently from the way I typed it when I created it. Will that mail reach me? If we do case-folding in ToASCII, then yes it will. If we don't do case-folding in ToASCII, then the mail will either bounce, or worse yet, it will go to some other user. While this pitfall (mail going to the wrong user because the sender typed the wrong case) has always been theoretically possible with ASCII local parts, it never happens in practice, because in practice mail servers recognize local parts case-insensitively. But if we omit case-folding from the IMAA ToASCII, then this pitfall will become very real, because the existing mail servers won't know how to do case-insensitive comparisons of ACE local parts. If we include case-folding in IMAA ToASCII, then all non-ASCII local parts are automatically case-insensitive, even on legacy mail servers. AMC From owner-ietf-imaa Mon Feb 10 18:07:30 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1B27UD26439 for ietf-imaa-bks; Mon, 10 Feb 2003 18:07:30 -0800 (PST) Received: from mercury.ccil.org (mail@[192.190.237.100]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1B27Td26435 for ; Mon, 10 Feb 2003 18:07:29 -0800 (PST) Received: from cowan by mercury.ccil.org with local (Exim 3.35 #1 (Debian)) id 18iPpb-0006XA-00 for ; Mon, 10 Feb 2003 21:07:31 -0500 Subject: Re: John Cowan on IMAA draft In-Reply-To: <20030210114416.GB9872@nicemice.net> from "Adam M. Costello" at "Feb 10, 2003 11:44:16 am" To: IETF IMAA list Date: Mon, 10 Feb 2003 21:07:31 -0500 (EST) X-Mailer: ELM [version 2.4ME+ PL66 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: From: John Cowan Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Adam M. Costello scripsit: > An ACE prefix (or suffix, or infix, etc) is necessary so that > applications know whether to convert ASCII local-parts to non-ASCII for > display. How else will my mail program know how to display the From > address of incoming mail? Hmm. You're right. -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_ From owner-ietf-imaa Mon Feb 10 18:18:08 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1B2I8j26859 for ietf-imaa-bks; Mon, 10 Feb 2003 18:18:08 -0800 (PST) Received: from mercury.ccil.org (mail@[192.190.237.100]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1B2I7d26854 for ; Mon, 10 Feb 2003 18:18:07 -0800 (PST) Received: from cowan by mercury.ccil.org with local (Exim 3.35 #1 (Debian)) id 18iPzv-0006nN-00 for ; Mon, 10 Feb 2003 21:18:11 -0500 Subject: Re: Case sensitivity on the LHS In-Reply-To: <20030211014413.GD16359@nicemice.net> from "Adam M. Costello" at "Feb 11, 2003 01:44:13 am" To: IETF IMAA list Date: Mon, 10 Feb 2003 21:18:11 -0500 (EST) X-Mailer: ELM [version 2.4ME+ PL66 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: From: John Cowan Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Adam M. Costello scripsit: > Is this a "typical" example, or the *only* example? Turkish i is the > only locale-dependent aspect of the Unicode case-folding operation, > according to the Unicode case-folding table. I suppose it's possible > that the table overlooks some other locale-dependent things that ought > to be in there, but can anyone name any examples besides Turkish i? Lithuanian "I" with an accent above lowercases to "I" + DOT ABOVE + the main accent, because (unlike all other "i"s with accents) the i keeps its dot. For Unicode case-folding purposes, this discrepancy is ignored. -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_ From owner-ietf-imaa Mon Feb 10 18:33:58 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1B2XwF27243 for ietf-imaa-bks; Mon, 10 Feb 2003 18:33:58 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1B2Xwd27239 for ; Mon, 10 Feb 2003 18:33:58 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18iQFF-0004lZ-00 for ; Mon, 10 Feb 2003 18:34:01 -0800 Date: Tue, 11 Feb 2003 02:34:01 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: Compatibility with IDNA Message-ID: <20030211023401.GE16359@nicemice.net> Reply-To: IETF IMAA list References: <8f$3A5e3cDD@3247.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8f$3A5e3cDD@3247.org> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Claus Färber wrote: > The basic idea is that IDNA, IMAA and similar internationalised > identifiers (e.g. newsgroup names) should be able to use the very same > encoding method. It would certainly be nice, if it's possible, but it's not obvious whether it's possible. > This has several advantages: > > . You can put a local-part, a domain, or a complete email address > into the same encoding/decoding function and the results are > correct. Currently, we don't even have a single encoding/decoding function for domain names. We have functions for domain labels. IDNA does not define functions for whole domain names because the delimiting and quoting conventions vary. For example, DNS master files use dots as delimiters, but DNS protocol messages don't. DNS master files use backslashes to quote dots that are not delimiters, but in DNS protocol messages there is nothing special about dots or backslashes. The message header and SMTP protocols add still more delimiters and quoting mechanisms. Various config file formats have their own ad-hoc quoting mechanisms, different from the ones use in message headers and DNS master files. It seems that the only hope is to require the application to parse the larger items (domain names, mail addresses) into their constituent parts (labels, local parts) using whatever delimiters and quoting mechanisms are appropriate in its context, and then standardize the encoding/decoding of the individual parts. In IDNA, each label is encoded as a unit, never subdivided, even if it contains dots or other ASCII punctuation. It's far too late to consider altering that fundamental architecture of IDNA. Maybe, if we had co-designed internationalized domain names, mail addresses, and newsgroup names all at the same time, we would have done things differently, but IDNA is done and approved and due to be deployed any day now. At this point, the most we could try for is to use the exact same encoding for local-parts (or subparts) as is used for domain labels. > . You can have a domain name embedded in a local-part and it is > encoded the same way as a domain on the right hand side if it is > delimited by one of the delimiters listed above (useful for the > so-called percent hack, MIXER, etc.) We might be able to do that if we subdivide local parts. > . The reverse is also true: You can have an email address converted to > a domain name (as seen in SOA DNS records, for example). We might be able to do that if we *don't* subdivide local parts. If we subdivide local parts, then foo.bar@example.org would be encoded differently depending on whether it was ACE-ified and then domain-ified, or domain-ified and then ACE-ified. > . Use the mixed-case annotation. (Yes, it has to be formalised then.) I support mixed-case annotation as an option, but not as a requirement. The meager incremental benefit of requiring it versus merely allowing it does not appear to be worth the additional required complexity. > . Use the same ACE prefix as IDNA. We should use the same ACE prefix if and only if the ToASCII and ToUnicode operations are identical. If two different sets of ToASCII/ToUnicode operations were to use the same prefix, that would invite errors where a string gets encoded by one ToASCII and decoded by the wrong ToUnicode, which would probably cause the original string to get converted into a non-equivalent string. AMC From owner-ietf-imaa Mon Feb 10 18:57:11 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1B2vBj27732 for ietf-imaa-bks; Mon, 10 Feb 2003 18:57:11 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1B2vAd27728 for ; Mon, 10 Feb 2003 18:57:10 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18iQbi-0004pZ-00 for ; Mon, 10 Feb 2003 18:57:14 -0800 Date: Tue, 11 Feb 2003 02:57:14 +0000 From: "Adam M. Costello" To: IETF IMAA list Subject: Re: Case sensitivity on the LHS Message-ID: <20030211025714.GF16359@nicemice.net> Reply-To: IETF IMAA list References: <20030211014413.GD16359@nicemice.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: John Cowan wrote: > > Turkish i is the only locale-dependent aspect of the Unicode > > case-folding operation, according to the Unicode case-folding > > table. I suppose it's possible that the table overlooks some other > > locale-dependent things that ought to be in there, but can anyone > > name any examples besides Turkish i? > > Lithuanian "I" with an accent above lowercases to "I" + DOT ABOVE + > the main accent, because (unlike all other "i"s with accents) the i > keeps its dot. For Unicode case-folding purposes, this discrepancy is > ignored. So if we consider case-mapping, rather than case-folding, then there are three affected languages rather than just two, but it's still all about the dot on the letter i. It still appears to me that this is an isolated anomaly, not a typical example from a large class of locale-dependent upper/lower case issues. Some people have raised a concern that case-folding causes damage (loss of important information). It does cause very slight damage in rare circumstances, but I think not doing it would cause annoyance (mail bouncing or going to the wrong user) quite often. AMC From owner-ietf-imaa Mon Feb 10 20:04:51 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1B44pE29233 for ietf-imaa-bks; Mon, 10 Feb 2003 20:04:51 -0800 (PST) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1B44od29229 for ; Mon, 10 Feb 2003 20:04:50 -0800 (PST) Received: (qmail 99896 invoked by uid 1016); 11 Feb 2003 04:05:19 -0000 Date: 11 Feb 2003 04:05:19 -0000 Message-ID: <20030211040519.99895.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: ietf-imaa@imc.org Subject: Sound mapping References: <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Adam M. Costello writes: > Now suppose the sender types that non-ASCII username slightly > differently from the way I typed it when I created it. Will that mail > reach me? If we do case-folding in ToASCII, then yes it will. False. Case folding fixes only some of the errors caused by human retyping. For example, studies have shown that SOUNDEX folding has a substantially higher correction rate. (Go read the literature.) Now that you're aware of this fact, are you going to demand that all similar-sounding names be mapped together? Why do you support case folding and not SOUNDEX folding? Of course, we're just talking about English. Errors are far more difficult to characterize in a world of many languages. As a practical matter, you're going to have to stop expecting the computer to fix your spelling mistakes. This is why I support ISO 14755. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-ietf-imaa Mon Feb 10 21:17:26 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1B5HQZ00888 for ietf-imaa-bks; Mon, 10 Feb 2003 21:17:26 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1B5HPd00884 for ; Mon, 10 Feb 2003 21:17:25 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18iSnR-00056t-00 for ; Mon, 10 Feb 2003 21:17:29 -0800 Date: Tue, 11 Feb 2003 05:17:29 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: Sound mapping Message-ID: <20030211051729.GG16359@nicemice.net> Reply-To: IETF IMAA list References: <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <20030211040519.99895.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030211040519.99895.qmail@cr.yp.to> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I wrote: > Now suppose the sender types that non-ASCII username slightly > differently from the way I typed it when I created it. Will that mail > reach me? If we do case-folding in ToASCII, then yes it will. "D. J. Bernstein" wrote: > False. Case folding fixes only some of the errors caused by human > retyping. For example, studies have shown that SOUNDEX folding has a > substantially higher correction rate. Right. There was a portion of my sentence that was in my head but didn't make it to my fingers. What I meant to write was: Now suppose the sender types that non-ASCII username slightly differently from the way I typed it when I created it, using an uppercase letter where I used a lowercase letter (or vice versa). I noticed the omission when I received the message back from the list server, but I thought the intention was clear enough from context. > Now that you're aware of this fact, are you going to demand that all > similar-sounding names be mapped together? No. > Why do you support case folding and not SOUNDEX folding? Because users are already accustomed to not bothering to remember and type the proper case of letters in mail addresses, because in practice it doesn't matter. By doing case-folding we can avoid surprising them. Users are not accustomed to mail addresses being sound-insensitive, so there's no point in us working harder to meet nonexistent expectations. AMC From owner-ietf-imaa Mon Feb 10 22:21:53 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1B6Lrk02368 for ietf-imaa-bks; Mon, 10 Feb 2003 22:21:53 -0800 (PST) Received: from leonis.nus.edu.sg (leonis.nus.edu.sg [137.132.1.18]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1B6Lod02363 for ; Mon, 10 Feb 2003 22:21:51 -0800 (PST) Received: from camus (8-55.priv.nus.edu.sg [172.18.8.55]) by leonis.nus.edu.sg (8.12.1/8.12.1) with SMTP id h1B6NBwE001576; Tue, 11 Feb 2003 14:23:13 +0800 (SGT) Message-ID: <001c01c2d195$b1370320$f57812ac@camus> From: "Maynard Kang" To: "D. J. Bernstein" , References: <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <20030211040519.99895.qmail@cr.yp.to> Subject: Re: Sound mapping Date: Tue, 11 Feb 2003 14:20:36 +0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: > Of course, we're just talking about English. Errors are far more > difficult to characterize in a world of many languages. As a practical > matter, you're going to have to stop expecting the computer to fix your > spelling mistakes. This is why I support ISO 14755. > IMHO, I think case folding is perfectly reasonable in a world where only upper-case characters are shown on keyboards (not just US keyboards, but also other keyboards that present Latin characters in addition to local characters) as it is a simplistic algorithm that can be implemented easily. To require compulsory 14755 input for e-mail addresses is plain ridiculous, if you ask me. It's like telling consumers that they need to know how to build a TV before they can watch it. I certainly do not want to have to know the internal code points of my e-mail address before I can enter it. regards, maynard From owner-ietf-imaa Tue Feb 11 04:31:10 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BCVAK15393 for ietf-imaa-bks; Tue, 11 Feb 2003 04:31:10 -0800 (PST) Received: from crow.verisign.com (crow.verisign.com [216.168.237.103]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1BCV9d15388 for ; Tue, 11 Feb 2003 04:31:09 -0800 (PST) Received: from vsvapostalgw3.prod.netsol.com (vsvapostalgw3.prod.netsol.com [10.170.12.61]) by crow.verisign.com (nsi_0.1/8.9.1) with ESMTP id HAA11638 for ; Tue, 11 Feb 2003 07:31:03 -0500 (EST) Received: by vsvapostalgw3.prod.netsol.com with Internet Mail Service (5.5.2653.19) id <1V7STYV0>; Tue, 11 Feb 2003 07:29:02 -0500 Message-ID: <3CD14E451751BD42BA48AAA50B07BAD603370662@vsvapostal3.prod.netsol.com> From: "Hollenbeck, Scott" To: "'IETF IMAA list'" Subject: RE: Case sensitivity on the LHS Date: Tue, 11 Feb 2003 07:27:04 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: > "Hollenbeck, Scott" wrote: > > > Is this document intended to be a formal update to 2821 and 2822? > > Is IDNA a formal update to RFCs 1034 and 1035? IMAA would bear a > similar relation to RFCs 821, 822, 2821, 2822. Maybe yours is a rhetorical question since you probably know the answer (I don't because IDNA hasn't yet been published as an RFC), but here's why I asked mine: someone implementing 1034/1035 or 2821/2822 (which, by the way, obsolete 821 and 822) might not necessarily know of the new features defined by IDNA and IMAA since there are no "update" references maintained by the RFC Editor. I suspect that it would probably be a good idea to have IDNA update 1034/1035, and likewise it's probably a good idea to have IMAA update 2821/2822 so that implementers see a clear relationship between the specifications. > > Both (2821 section 4.1.2 and 2822 section 3.4.1) contain formal > > definitions of the local part of an email address. > > IMAA does not redefine local part, but instead defines a new term, > internationalized local part. Just as IDNA does not redefine domain > label, but instead defines a new term, internationalized domain label. These new terms won't necessarily be known to implementers of the earlier specifications that IDNA and IMAA build upon. If new features with new processing rules are being defined, why not give RFC readers a clear pointer to the specifications that describe the new features? Asking a different way: would these new features (or something similar) have been included in 1034/1035 or 2821/2822 if internationalization was considered when the earlier specifications were being written? While we'll never know for sure, I'm suggesting that the possibility of a "yes" answer implies that IDNA and IMAA _should_ update the earlier specifications. -Scott- From owner-ietf-imaa Tue Feb 11 05:17:26 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BDHQt18383 for ietf-imaa-bks; Tue, 11 Feb 2003 05:17:26 -0800 (PST) Received: from yxa.extundo.com (178.230.13.217.in-addr.dgcsystems.net [217.13.230.178]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1BDHOd18375 for ; Tue, 11 Feb 2003 05:17:24 -0800 (PST) Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) (authenticated bits=0) by yxa.extundo.com (8.12.7/8.12.7) with ESMTP id h1BDHMNG023285 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK) for ; Tue, 11 Feb 2003 14:17:23 +0100 To: IETF IMAA list Subject: Re: Case sensitivity on the LHS X-Payment: hashcash 1.1 0:030211:ietf-imaa@imc.org:5019fd0edc5cc9c4 X-Hashcash: 0:030211:ietf-imaa@imc.org:5019fd0edc5cc9c4 From: Simon Josefsson Date: Tue, 11 Feb 2003 14:17:22 +0100 In-Reply-To: <20030211014413.GD16359@nicemice.net> ("Adam M. Costello"'s message of "Tue, 11 Feb 2003 01:44:13 +0000") Message-ID: User-Agent: Gnus/5.090016 (Oort Gnus v0.16) Emacs/21.2 References: <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> X-Face: %bo>yc#X1.-jVa- List-Unsubscribe: List-ID: "Adam M. Costello" writes: > "Hollenbeck, Scott" wrote: > >> Is this document intended to be a formal update to 2821 and 2822? > > Is IDNA a formal update to RFCs 1034 and 1035? IDNA does not modify RFC 1034/1035 behaviour, so no. > IMAA would bear a similar relation to RFCs 821, 822, 2821, 2822. Not if it makes changes to those RFCs, which seems to be (part of) what this discussion is about. >> If you use NFKC you will collapse many distinct names of humans into >> the same name, which is a failure as far as LHS is concerned. C.f. ß >> maps to ss. > > NFKC does not map German sharp s to ss. Neither does NFC. It is the > Unicode case-folding operation that maps German sharp s to ss. It has > nothing to do with NFKC. You are right, sorry. Still, it is part of the nameprep steps, so the result is the same. > You claim that "many distinct names" of humans will get collapsed by > NFKC into the same name. So far you have provided zero examples. Could > you supply a few more please? If you want to modify existing standards, I think you need to prove that it doesn't break things, not the other way around. >> Changing the LHS definition in RFC 282{1,2} should IMHO be done based >> on technical reasons, and I don't see any technical reason presented >> above. Arguing that users doesn't read the technical specification >> isn't a good motivation for changing the specification; users will >> never read the technical specification. Applications are responsible >> for implementing a non-surprising behavior for clients (which I >> agree treating LHS as case-insensitive is), and with the current >> specifications they can, e.g. by searching case insensitively. > > Here is the technical argument: The day that IMAA is adopted, I should > be able to create an ACE username on yahoo.com, and people should be > able to send mail to that account by typing the corresponding non-ASCII > username into their IMAA-aware mail user agent. The mail should reach > me even if yahoo.com is completely IMAA-unaware. You don't need to change the definition of LHS for this. > Now suppose the sender types that non-ASCII username slightly > differently from the way I typed it when I created it. Will that mail > reach me? If we do case-folding in ToASCII, then yes it will. If we > don't do case-folding in ToASCII, then the mail will either bounce, or > worse yet, it will go to some other user. The same is true today. > While this pitfall (mail going to the wrong user because the sender > typed the wrong case) has always been theoretically possible with > ASCII local parts, it never happens in practice, because in practice > mail servers recognize local parts case-insensitively. But if we > omit case-folding from the IMAA ToASCII, then this pitfall will > become very real, because the existing mail servers won't know how > to do case-insensitive comparisons of ACE local parts. If we > include case-folding in IMAA ToASCII, then all non-ASCII local parts > are automatically case-insensitive, even on legacy mail servers. You are saying that LHS works case insentivively today becase mail servers already treat them case-insensitively, but yet think that they will not know how to do case-insensitive ACE mappings if IMAA is introduced? If server administrators want case insensitive behaviour, they can instruct their software in that way. You claim this works today, I claim it will work tomorrow with or without IMAA. If server administrators doesn't want case insensitive behaviour, which is fine by todays specifications, then I don't see a reason why they should suffer any pain only because other people want a different behaviour from their software but is incapable of configuring their software in that way. It seems to me that the question of case sensitive LHS can continue to be decided by server administrators, not specification writers. From owner-ietf-imaa Tue Feb 11 05:59:08 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BDx8t20276 for ietf-imaa-bks; Tue, 11 Feb 2003 05:59:08 -0800 (PST) Received: from mailgen2.internet.gouv.qc.ca (courrier4.internet.gouv.qc.ca [192.197.162.9] (may be forged)) by above.proper.com (8.11.6/8.11.3) with SMTP id h1BDx7d20271 for ; Tue, 11 Feb 2003 05:59:07 -0800 (PST) Received: (qmail 8649 invoked from network); 11 Feb 2003 13:58:55 -0000 Received: from unknown (HELO p295.sct1.gouv.qc.ca) (142.213.85.47) by mailgen2.internet.gouv.qc.ca with SMTP; 11 Feb 2003 13:58:55 -0000 Message-Id: <5.0.2.1.2.20030211083406.00a95b50@entree.sct1.gouv.qc.ca> X-Sender: alabonte@entree.sct1.gouv.qc.ca X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Tue, 11 Feb 2003 08:58:57 -0500 To: Simon Josefsson , IETF IMAA list From: =?iso-8859-1?Q?Alain_LaBont=E9?= Subject: Re: Case sensitivity on the LHS Cc: alb@iquebec.com In-Reply-To: References: <20030211014413.GD16359@nicemice.net> <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: A 14:17 2003-02-11 +0100, Simon Josefsson a écrit : [non-quoted correspondent] > > You claim that "many distinct names" of humans will get collapsed by > > NFKC into the same name. So far you have provided zero examples. Could > > you supply a few more please? > >[Simon] If you want to modify existing standards, I think you need to prove >that it doesn't break things, not the other way around. [Alain] Could somebody summarize what is the actual behaviour of NFKC for me? I'm not sure what kind of mapping is done by NFKC... but I suspect the kind of problems that may ocur. For example, if accents are removed from Latin letters once entered (which in most cases would be convenient for those who can't enter accented characters -- because I would like them to reach me if my email address were to be « Alain.LaBonté@iquébec.com »), there might indeed be collapses, and I will give an actual example, using a look-alike family-name cluster from the city of Québec's telephone book: There are actual real-life collisions for example, in these cases: Cote B Côte B Coté B Côté B This was already one of my favorites for dictionary ordering (the 4 family names, are 4 disting French words, meaning respectively "quote", "hill side", "quoted" and "side"). However this case of collapse is imho no different from cases of names collapsing just because they are simply identical. One has, with the same ISP, to find extra ways to distinguish them. Imho an email address "B.Cote@..." should be able to reach "B.Côté@..." Alain LaBonté Québec From owner-ietf-imaa Tue Feb 11 06:27:21 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BERLk24356 for ietf-imaa-bks; Tue, 11 Feb 2003 06:27:21 -0800 (PST) Received: from smtp.denic.de (smtp.denic.de [194.246.96.22]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1BERJd24348 for ; Tue, 11 Feb 2003 06:27:19 -0800 (PST) Received: from notes.denic.de (denics15.denic.de [194.246.96.18]) by smtp.denic.de with esmtp id 18ibNT-0007ib-00; Tue, 11 Feb 2003 15:27:15 +0100 Subject: Re: Re: Case sensitivity on the LHS To: ietf-imaa@imc.org X-Mailer: Lotus Notes Release 5.0.6a January 17, 2001 Message-ID: From: "Marcos Sanz/Denic" Date: Tue, 11 Feb 2003 15:28:19 +0100 X-MIMETrack: Serialize by Router on notes/Denic(Release 5.0.11 |July 24, 2002) at 11.02.2003 15:27:14 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: On 10.02.2003 15:48 Martin Duerst wrote: > [...] > But even then, being able to use the same > nameprep/stringprep for both sides of the '@' is a clear win. As far as I remember, Nameprep states to be a profile for preparing domain names and it should not be used for any other purpose. Wouldn't it be a better idea to make anyway a new profile for IMAA? My 2p. Regards, Marcos Sanz From owner-ietf-imaa Tue Feb 11 06:58:33 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BEwXE26704 for ietf-imaa-bks; Tue, 11 Feb 2003 06:58:33 -0800 (PST) Received: from yxa.extundo.com (178.230.13.217.in-addr.dgcsystems.net [217.13.230.178]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1BEwUd26700 for ; Tue, 11 Feb 2003 06:58:31 -0800 (PST) Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) (authenticated bits=0) by yxa.extundo.com (8.12.7/8.12.7) with ESMTP id h1BEwQNG026578 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Tue, 11 Feb 2003 15:58:28 +0100 To: Alain =?iso-8859-1?q?LaBont=E9?= Cc: IETF IMAA list , alb@iquebec.com Subject: Re: Case sensitivity on the LHS X-Payment: hashcash 1.1 0:030211:alb@sct1.gouv.qc.ca:078f1ff838583efb X-Hashcash: 0:030211:alb@sct1.gouv.qc.ca:078f1ff838583efb X-Payment: hashcash 1.1 0:030211:ietf-imaa@imc.org:114c7a167863957a X-Hashcash: 0:030211:ietf-imaa@imc.org:114c7a167863957a X-Payment: hashcash 1.1 0:030211:alb@iquebec.com:c64a1ad7a8b64f8f X-Hashcash: 0:030211:alb@iquebec.com:c64a1ad7a8b64f8f From: Simon Josefsson Date: Tue, 11 Feb 2003 15:58:25 +0100 In-Reply-To: <5.0.2.1.2.20030211083406.00a95b50@entree.sct1.gouv.qc.ca> (Alain =?iso-8859-1?q?LaBont=E9's?= message of "Tue, 11 Feb 2003 08:58:57 -0500") Message-ID: User-Agent: Gnus/5.090016 (Oort Gnus v0.16) Emacs/21.2 References: <20030211014413.GD16359@nicemice.net> <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <5.0.2.1.2.20030211083406.00a95b50@entree.sct1.gouv.qc.ca> X-Face: %bo>yc#X1.-jVa- List-Unsubscribe: List-ID: Alain LaBonté writes: > A 14:17 2003-02-11 +0100, Simon Josefsson a écrit : > > [non-quoted correspondent] >> > You claim that "many distinct names" of humans will get collapsed by >> > NFKC into the same name. So far you have provided zero examples. Could >> > you supply a few more please? >> >>[Simon] If you want to modify existing standards, I think you need to prove >>that it doesn't break things, not the other way around. > > [Alain] Could somebody summarize what is the actual behaviour of NFKC for me? > > I'm not sure what kind of mapping is done by NFKC... but I suspect > the kind of problems that may ocur. > > For example, if accents are removed from Latin letters once > entered (which in most cases would be convenient for those who can't > enter accented characters -- because I would like them to reach me if > my email address were to be « Alain.LaBonté@iquébec.com »), there > might indeed be collapses, and I will give an actual example, using a > look-alike family-name cluster from the city of Québec's telephone > book: > > There are actual real-life collisions for example, in these cases: While being an interesting example, nameprep works fine on these strings. To illustrate, I'm including below (a) initial data as UTF-8 (b) after NFKC (c) after nameprep. > Cote B Initial data (length 6): 43 6f 74 65 20 42 After normalization (length 6): 43 6f 74 65 20 42 After nameprep (length 6): 63 6f 74 65 20 62 > Côte B Initial data (length 7): 43 c3 b4 74 65 20 42 After normalization (length 7): 43 c3 b4 74 65 20 42 After nameprep (length 7): 63 c3 b4 74 65 20 62 > Coté B Initial data (length 7): 43 6f 74 c3 a9 20 42 After normalization (length 7): 43 6f 74 c3 a9 20 42 After nameprep (length 7): 63 6f 74 c3 a9 20 62 > Côté B Initial data (length 8): 43 c3 b4 74 c3 a9 20 42 After normalization (length 8): 43 c3 b4 74 c3 a9 20 42 After nameprep (length 8): 63 c3 b4 74 c3 a9 20 62 > This was already one of my favorites for dictionary ordering (the > 4 family names, are 4 disting French words, meaning respectively > "quote", "hill side", "quoted" and "side"). > > However this case of collapse is imho no different from cases of > names collapsing just because they are simply identical. One has, with > the same ISP, to find extra ways to distinguish them. Imho an email > address "B.Cote@..." should be able to reach "B.Côté@..." As you can see, a nameprep approach would distinguish between all four strings. I think this is good though, and even fear that it might even be too aggressive. From owner-ietf-imaa Tue Feb 11 07:08:35 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BF8Ze27118 for ietf-imaa-bks; Tue, 11 Feb 2003 07:08:35 -0800 (PST) Received: from m3001.hostcentric.net (m3001.hostcentric.net [216.157.79.237]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1BF8Yd27114 for ; Tue, 11 Feb 2003 07:08:34 -0800 (PST) Received: (qmail 6928 invoked by alias); 11 Feb 2003 15:08:35 -0000 Received: from unknown (HELO DAVIS1) (12.234.226.61) by 0 with SMTP; 11 Feb 2003 15:08:35 -0000 Message-ID: <003d01c2d1df$6cedef40$7300a8c0@DAVIS1> From: "Mark Davis" To: "Simon Josefsson" , "IETF IMAA list" , "Alain LaBonté" Cc: References: <20030211014413.GD16359@nicemice.net> <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <5.0.2.1.2.20030211083406.00a95b50@entree.sct1.gouv.qc.ca> Subject: Re: Case sensitivity on the LHS Date: Tue, 11 Feb 2003 07:08:24 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: > I'm not sure what kind of mapping is done by NFKC... but I suspect the > kind of problems that may ocur. > > For example, if accents are removed from Latin letters once entered ... NFKC does not remove the accents from Latin letters. If you are going to comment on NFKC, pro or contra, you should comment on what it does, rather than on simple speculation as to what it does. The formal results are in UAX #15 on the Unicode site. There is a chart that shows the effects on characters on http://www.unicode.org/charts/normalization/. For example, if you look at Latin characters on http://www.unicode.org/charts/normalization/chart_Latin.html you will see that "ô" remains as "ô" in NFKC. Mark ________ mark.davis@jtcsv.com IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 ----- Original Message ----- From: "Alain LaBonté" To: "Simon Josefsson" ; "IETF IMAA list" Cc: Sent: Tuesday, February 11, 2003 05:58 Subject: Re: Case sensitivity on the LHS > > A 14:17 2003-02-11 +0100, Simon Josefsson a écrit : > > [non-quoted correspondent] > > > You claim that "many distinct names" of humans will get collapsed by > > > NFKC into the same name. So far you have provided zero examples. Could > > > you supply a few more please? > > > >[Simon] If you want to modify existing standards, I think you need to prove > >that it doesn't break things, not the other way around. > > [Alain] Could somebody summarize what is the actual behaviour of NFKC for me? > > I'm not sure what kind of mapping is done by NFKC... but I suspect the > kind of problems that may ocur. > > For example, if accents are removed from Latin letters once entered > (which in most cases would be convenient for those who can't enter accented > characters -- because I would like them to reach me if my email address > were to be « Alain.LaBonté@iquébec.com »), there might indeed be collapses, > and I will give an actual example, using a look-alike family-name cluster > from the city of Québec's telephone book: > > There are actual real-life collisions for example, in these cases: > > Cote B > Côte B > Coté B > Côté B > > This was already one of my favorites for dictionary ordering (the 4 > family names, are 4 disting French words, meaning respectively "quote", > "hill side", "quoted" and "side"). > > However this case of collapse is imho no different from cases of names > collapsing just because they are simply identical. One has, with the same > ISP, to find extra ways to distinguish them. Imho an email address > "B.Cote@..." should be able to reach "B.Côté@..." > > Alain LaBonté > Québec > > From owner-ietf-imaa Tue Feb 11 09:34:23 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BHYNK07149 for ietf-imaa-bks; Tue, 11 Feb 2003 09:34:23 -0800 (PST) Received: from m3001.hostcentric.net (m3001.hostcentric.net [216.157.79.237]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1BHYLd07141 for ; Tue, 11 Feb 2003 09:34:21 -0800 (PST) Received: (qmail 25074 invoked by alias); 11 Feb 2003 17:34:23 -0000 Received: from unknown (HELO DAVIS1) (12.234.226.61) by 0 with SMTP; 11 Feb 2003 17:34:23 -0000 Message-ID: <008a01c2d1f3$cb36f5b0$7300a8c0@DAVIS1> From: "Mark Davis" To: "IETF IMAA list" References: <20030211014413.GD16359@nicemice.net> <20030211025714.GF16359@nicemice.net> Subject: Re: Case sensitivity on the LHS Date: Tue, 11 Feb 2003 09:34:12 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: 1. From a practical point of view, I personally very much in favor of case-insensitive names. While many programmers may be accustomed to (and use) case sensitivity, the average user is simply annoyed by it, or worse. Having mail bounce from john@foo.com because it had to be John@foo.com just doesn't make sense to most people. 2. In terms of data, the Lithuanian accent behavior exists, but the use of such accents is uncommon -- it is really for dictionary annotations. So from a practical standpoint it does not play a role. 3. Turkic spelling is the more important issue. If the normal case folding is used, then one gets the following equivalence sets: {i, I} {I} {i} So kiz@foo.com and KIZ@foo.com are considered case variants, while kiz@foo.com and KIZ@foo.com are different. Turkic languages, on the other hand, use: {i, I} {i, I} For them, kiz@foo.com and KIZ@foo.com should be considered case variants, and different from kiz@foo.com and KIZ@foo.com (which should also be case variants). Where a system can have different case matching behavior for different languages, this is not a problem. So where a database client presents data sorted or selected for Turkish, this should be taken into account. Where a system needs a single uniform case matching behavior over all strings, such as for a case-insensitive file system, typically implementations use an inclusive case matching, since it covers the vast majority of the world. That is, they use the following equivalence set: {i, I, I, i} The one downside for Turkic languages is that it does not allow them to have two email addresses that only differ by the dots on the I(s). Mark ________ mark.davis@jtcsv.com IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 ----- Original Message ----- From: "Adam M. Costello" To: "IETF IMAA list" Sent: Monday, February 10, 2003 18:57 Subject: Re: Case sensitivity on the LHS > > John Cowan wrote: > > > > Turkish i is the only locale-dependent aspect of the Unicode > > > case-folding operation, according to the Unicode case-folding > > > table. I suppose it's possible that the table overlooks some other > > > locale-dependent things that ought to be in there, but can anyone > > > name any examples besides Turkish i? > > > > Lithuanian "I" with an accent above lowercases to "I" + DOT ABOVE + > > the main accent, because (unlike all other "i"s with accents) the i > > keeps its dot. For Unicode case-folding purposes, this discrepancy is > > ignored. > > So if we consider case-mapping, rather than case-folding, then there > are three affected languages rather than just two, but it's still > all about the dot on the letter i. It still appears to me that this > is an isolated anomaly, not a typical example from a large class of > locale-dependent upper/lower case issues. > > Some people have raised a concern that case-folding causes damage (loss > of important information). It does cause very slight damage in rare > circumstances, but I think not doing it would cause annoyance (mail > bouncing or going to the wrong user) quite often. > > AMC > From owner-ietf-imaa Tue Feb 11 10:54:26 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BIsQ710199 for ietf-imaa-bks; Tue, 11 Feb 2003 10:54:26 -0800 (PST) Received: from mail.uni-bielefeld.de (IDENT:72@mail2.uni-bielefeld.de [129.70.4.90]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1BIsPd10195 for ; Tue, 11 Feb 2003 10:54:25 -0800 (PST) Received: from 192.168.0.17 (ppp36-56.hrz.uni-bielefeld.de [129.70.36.56]) by mail.uni-bielefeld.de (Sun Internet Mail Server sims.4.0.2000.10.12.16.25.p8) with ESMTP id <0HA500805R6KXS@mail.uni-bielefeld.de> for ietf-imaa@imc.org; Tue, 11 Feb 2003 19:54:22 +0100 (MET) Date: Tue, 11 Feb 2003 19:47:15 +0100 From: Marc Mutz Subject: Re: Case sensitivity on the LHS In-reply-to: To: ietf-imaa@imc.org Message-id: <200302111947.30700@sendmail.mutz.com> Organization: KDE MIME-version: 1.0 Content-type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_CVUS+eOvYT6jMLA"; charset="iso-8859-1" Content-transfer-encoding: 7bit User-Agent: KMail/1.5.9 X-PGP-Key: 0xBDBFE838 References: Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --Boundary-02=_CVUS+eOvYT6jMLA Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Description: signed data Content-Disposition: inline On Sunday 09 February 2003 20:16, Paul Hoffman / IMC wrote: > RFC 2821 and RFC 2822 make it clear that the left-hand side (LHS) of > email addresses are opaque, which in turn means they are > case-sensitive. The -00 draft of IMAA preserves this. It is of course nice to be able to use one function for the domain, as=20 well as the local-part, but since we already consider different field=20 delimiters in addition to DOT, this single-function argument probably=20 won't fly in practice anyway. So we might as well define another profile. And IMO, it should _not_=20 include case folding. For the simple reason that it's too aggressive. As an example (in addition to the ones already given by others),=20 consider German "Ma=DFe" (measures). The sz ligature is a character that=20 only exists in lower case form. I don't know what lead IDNA to fold=20 that to "ss", but I think that this is a bug in IDNA and should be=20 avoided by all means in IMAA. That's b/c Masse is German for "mass".=20 :-o The easy way to avoid this is doing no case folding. The hard way (since=20 incompatible with IDNA) would be to fix the =DF-mapping in the IMAA=20 tables... Marc =2D-=20 You can fool some people sometimes But you can't fool all the people all the time -- Bob Marley --Boundary-02=_CVUS+eOvYT6jMLA Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQA+SUVB3oWD+L2/6DgRAsqSAKDFBpXKUqHeiArs+tmXwM+SoeYUbACeJP01 R/A8rthSmBiQMxNZk4ZHV+M= =zmxv -----END PGP SIGNATURE----- --Boundary-02=_CVUS+eOvYT6jMLA-- From owner-ietf-imaa Tue Feb 11 10:58:33 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BIwXL10282 for ietf-imaa-bks; Tue, 11 Feb 2003 10:58:33 -0800 (PST) Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1BIwWd10277 for ; Tue, 11 Feb 2003 10:58:32 -0800 (PST) Received: from [209.187.148.215] (helo=p3.JCK.COM) by bs.jck.com with esmtp (Exim 4.10) id 18ifbr-0006gf-00; Tue, 11 Feb 2003 13:58:23 -0500 Date: Tue, 11 Feb 2003 13:58:23 -0500 From: John C Klensin To: "Hollenbeck, Scott" , "'IETF IMAA list'" Subject: IMAA (or alternative) and 2821/2822 (was: RE: Case sensitivity on the LHS) Message-ID: <152208734.1044971903@p3.JCK.COM> In-Reply-To: <3CD14E451751BD42BA48AAA50B07BAD603370662@vsvapostal3.prod.netsol.com> References: <3CD14E451751BD42BA48AAA50B07BAD603370662@vsvapostal3 .prod.netsol.com> X-Mailer: Mulberry/3.0.0 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Scott, The listing of 2821/2822 as obsoleting 821/822 was basically a mistake on someone's fault -- Proposed Standards can't replace full Standards. Questions about why the RFC-Index has not been corrected should be addressed to the RFC Editor and/or IESG... I don't know. However, while the process is moving very slowly (anyone who went through the DRUMS end-game can probably guess at many of the reasons), Pete and I are working on revisions/updates to 2821/2822 that fix the errors and incorporate changes suggested on various lists. IMO, it would border on the insane to produce new versions of those documents that omit mention of internationalization work. But it would probably be unwise to make those mentions/references normative, since doing so would tie the maturity level of 2821/2822 to that of the internationalization work. E.g., it would force the revisions to recycle at Proposed, rather than being candidates for processing at Draft. There is another twist in this, best seen when one examines the question of updating 1034/1035. For better or worse, a key property of IDNA was that it not modify 1034/1035 in any way. I don't see any way that it can be meaningful to claim that IDNA doesn't modify 1034/1035 in any way and that it "updates" them. regards, john --On Tuesday, 11 February, 2003 07:27 -0500 "Hollenbeck, Scott" wrote: > >> "Hollenbeck, Scott" wrote: >> >> > Is this document intended to be a formal update to 2821 and >> > 2822? >> >> Is IDNA a formal update to RFCs 1034 and 1035? IMAA would >> bear a similar relation to RFCs 821, 822, 2821, 2822. > > Maybe yours is a rhetorical question since you probably know > the answer (I don't because IDNA hasn't yet been published as > an RFC), but here's why I asked mine: someone implementing > 1034/1035 or 2821/2822 (which, by the way, obsolete 821 and > 822) might not necessarily know of the new features defined by > IDNA and IMAA since there are no "update" references > maintained by the RFC Editor. I suspect that it would > probably be a good idea to have IDNA update 1034/1035, and > likewise it's probably a good idea to have IMAA update > 2821/2822 so that implementers see a clear relationship > between the specifications. > >> > Both (2821 section 4.1.2 and 2822 section 3.4.1) contain >> > formal definitions of the local part of an email address. >> >> IMAA does not redefine local part, but instead defines a new >> term, internationalized local part. Just as IDNA does not >> redefine domain label, but instead defines a new term, >> internationalized domain label. > > These new terms won't necessarily be known to implementers of > the earlier specifications that IDNA and IMAA build upon. If > new features with new processing rules are being defined, why > not give RFC readers a clear pointer to the specifications > that describe the new features? > > Asking a different way: would these new features (or something > similar) have been included in 1034/1035 or 2821/2822 if > internationalization was considered when the earlier > specifications were being written? While we'll never know for > sure, I'm suggesting that the possibility of a "yes" answer > implies that IDNA and IMAA _should_ update the earlier > specifications. > > -Scott- From owner-ietf-imaa Tue Feb 11 11:07:11 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BJ7BH10532 for ietf-imaa-bks; Tue, 11 Feb 2003 11:07:11 -0800 (PST) Received: from crow.verisign.com (crow.verisign.com [216.168.237.103]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1BJ79d10528 for ; Tue, 11 Feb 2003 11:07:09 -0800 (PST) Received: from VSVAPOSTALGW1.prod.netsol.com (vsvapostalgw1.prod.netsol.com [10.170.12.38]) by crow.verisign.com (nsi_0.1/8.9.1) with ESMTP id OAA07691; Tue, 11 Feb 2003 14:07:04 -0500 (EST) Received: by VSVAPOSTALGW1.prod.netsol.com with Internet Mail Service (5.5.2653.19) id <1V7PBABM>; Tue, 11 Feb 2003 14:03:14 -0500 Message-ID: <3CD14E451751BD42BA48AAA50B07BAD603370674@vsvapostal3.prod.netsol.com> From: "Hollenbeck, Scott" To: "'John C Klensin'" , "'IETF IMAA list'" Subject: RE: IMAA (or alternative) and 2821/2822 (was: RE: Case sensitivit y on the LHS) Date: Tue, 11 Feb 2003 14:03:05 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: > There is another twist in this, best seen when one examines the > question of updating 1034/1035. For better or worse, a key > property of IDNA was that it not modify 1034/1035 in any way. I > don't see any way that it can be meaningful to claim that IDNA > doesn't modify 1034/1035 in any way and that it "updates" them. Thanks, I'd forgotten that point. As a practical matter I can see the value in not tying the specifications together. -Scott- From owner-ietf-imaa Tue Feb 11 11:24:26 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BJOQS11588 for ietf-imaa-bks; Tue, 11 Feb 2003 11:24:26 -0800 (PST) Received: from m3001.hostcentric.net (m3001.hostcentric.net [216.157.79.237]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1BJOOd11582 for ; Tue, 11 Feb 2003 11:24:24 -0800 (PST) Received: (qmail 17253 invoked by alias); 11 Feb 2003 19:24:27 -0000 Received: from unknown (HELO DAVIS1) (12.234.226.61) by 0 with SMTP; 11 Feb 2003 19:24:27 -0000 Message-ID: <00a401c2d203$2ad8a090$7300a8c0@DAVIS1> From: "Mark Davis" To: "Marc Mutz" , References: <200302111947.30700@sendmail.mutz.com> Subject: Re: Case sensitivity on the LHS Date: Tue, 11 Feb 2003 11:24:14 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: The reason that those are case folded is that the uppercase of Maße is MASSE, which is the same as the uppercase of Masse. So for it to be a well-defined equivalence relation, {ss, SS, ß} have to be in the same class. Mark ________ mark.davis@jtcsv.com IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 ----- Original Message ----- From: "Marc Mutz" To: Sent: Tuesday, February 11, 2003 10:47 Subject: Re: Case sensitivity on the LHS From owner-ietf-imaa Tue Feb 11 11:32:19 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BJWJb12246 for ietf-imaa-bks; Tue, 11 Feb 2003 11:32:19 -0800 (PST) Received: from mail.reutershealth.com ([65.246.141.36]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1BJWId12241 for ; Tue, 11 Feb 2003 11:32:18 -0800 (PST) Received: from skunk.reutershealth.com (mail [65.246.141.36]) by mail.reutershealth.com (Pro-8.9.3/Pro-8.9.3) with SMTP id OAA02525; Tue, 11 Feb 2003 14:29:24 -0500 (EST) Message-Id: <200302111929.OAA02525@mail.reutershealth.com> Received: by skunk.reutershealth.com (sSMTP sendmail emulation); Tue, 11 Feb 2003 14:31:50 -0500 From: John Cowan Subject: Re: Case sensitivity on the LHS To: mark.davis@jtcsv.com (Mark Davis) Date: Tue, 11 Feb 2003 14:31:50 -0500 (EST) Cc: ietf-imaa@imc.org (IETF IMAA list) In-Reply-To: <008a01c2d1f3$cb36f5b0$7300a8c0@DAVIS1> from "Mark Davis" at Feb 11, 2003 09:34:12 AM X-Mailer: ELM [version 2.5 PL6] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Mark Davis scripsit: > The one downside for Turkic languages is that it does not allow them to have > two email addresses that only differ by the dots on the I(s). This is not much of a downside, at least for Turkish, because due to the vowel harmony rules you can usually tell whether an i is dotted or dotless. -- After fixing the Y2K bug in an application: John Cowan WELCOME TO jcowan@reutershealth.com DATE: SUNDAK, JANUARK 1, 2000 http://www.ccil.org/~cowan From owner-ietf-imaa Tue Feb 11 13:40:17 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BLeHP17213 for ietf-imaa-bks; Tue, 11 Feb 2003 13:40:17 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1BLeFd17207 for ; Tue, 11 Feb 2003 13:40:16 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id QAA12518; Tue, 11 Feb 2003 16:40:11 -0500 Message-Id: <4.2.0.58.J.20030211162348.0479ea38@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Tue, 11 Feb 2003 16:27:57 -0500 To: "Mark Davis" , "Marc Mutz" , From: Martin Duerst Subject: Re: Case sensitivity on the LHS In-Reply-To: <00a401c2d203$2ad8a090$7300a8c0@DAVIS1> References: <200302111947.30700@sendmail.mutz.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: [using sz for the German sharp-s] At 11:24 03/02/11 -0800, Mark Davis wrote: >The reason that those are case folded is that the uppercase of Masze is >MASSE, which is the same as the uppercase of Masse. So for it to be a >well-defined equivalence relation, {ss, SS, sz} have to be in the same class. Well, this is only the case if you want a full "round-trip" equivalence. For the purpose of IDNA and IMAA, it would have been possible to define an equivalence with two classes ({ss, SS}, {sz}) without actual problems, because IDNA maps to lowercase. (The only exception being for people trying to input words with sz in all-uppercase, which they won't do anyway.) But given that IDNA has taken the decision it has, I don't think it's worth to create a whole new table with all the associated confusion for IMAA just for this little tweak. Regards, Martin. From owner-ietf-imaa Tue Feb 11 13:49:11 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BLnBk17486 for ietf-imaa-bks; Tue, 11 Feb 2003 13:49:11 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1BLn8d17479 for ; Tue, 11 Feb 2003 13:49:09 -0800 (PST) Received: (qmail 2792 invoked by uid 66); 11 Feb 2003 21:49:04 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 11 Feb 2003 21:49:04 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-11-0944d); 11 Feb 2003 22:48:30 +0100 Date: 11 Feb 2003 12:17:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8fd5$Jh3cDD@3247.org> In-Reply-To: <20030211023401.GE16359@nicemice.net> Subject: Re: Compatibility with IDNA User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-11-0944d MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Adam M. Costello schrieb/wrote: > Claus Färber wrote: >> This has several advantages: >> >> . You can put a local-part, a domain, or a complete email address >> into the same encoding/decoding function and the results are >> correct. > Currently, we don't even have a single encoding/decoding function for > domain names. We have functions for domain labels. IDNA does not > define functions for whole domain names because the delimiting and > quoting conventions vary. For example, DNS master files use dots as > delimiters, but DNS protocol messages don't. DNS master files use > backslashes to quote dots that are not delimiters, but in DNS protocol > messages there is nothing special about dots or backslashes. Well, any quoting in config files is not part of the address. The next generation of software will take UTF-8 as input and encode the labels automatically anyway. > The message header and SMTP protocols add still more delimiters and > quoting mechanisms. Any quoting, whitespace, etc. allowed by RFC2822 is just extra noise used to embed an email address in higher-level protocols. The real question is how to deal with the minimum quoting required by RFC 2821. Is that considered part of the email address? For example, how is that quoting handled if such an email address is included as a DNS label? Do MTAs match the email address ``"joe user"@example.com'' against the login name ``joe user'' or agains ``"joe user"''? This basically defines how much of the encoding to undo before doing the encoding (and to reapply after doing the encoding to the extent necessary). Well, even if this means that you have to encode the local-part and the RHS separatly, it would still be a benifit to be able to use the same function on both sides (which will still encode valid domain names embedded in the local part the same way as IDNA). > Various config file formats have their own ad-hoc quoting mechanisms, > different from the ones use in message headers and DNS master files. Again, this is just extra noise used to embed an email address in higher-level protocols. > In IDNA, each label is encoded as a unit, never subdivided, even if > it contains dots or other ASCII punctuation. It's far too late to > consider altering that fundamental architecture of IDNA. All of the characters that could make a difference are not valid within a domain label that is part of a domain host name (which use ``UseSTD3ASCIIRules''). So even although IDNA is being deployed now, there will be no domain names where it would make any difference. I think that changes that don't make a difference for deployed IDNs should still be possible. Or, what describes the idea better: It should be possible to define IMAA in a way so that the its encoding function just happens to work with *valid* domain names too. > Maybe, if we had co-designed internationalized domain names, mail any > At this point, the most we could try for is to use the exact same > encoding for local-parts (or subparts) as is used for domain labels. >> . You can have a domain name embedded in a local-part and it is >> encoded the same way as a domain on the right hand side if it is >> delimited by one of the delimiters listed above (useful for the >> so-called percent hack, MIXER, etc.) > We might be able to do that if we subdivide local parts. >> . The reverse is also true: You can have an email address converted to >> a domain name (as seen in SOA DNS records, for example). > We might be able to do that if we *don't* subdivide local parts. Oh, right. I missed the point that c\.faerber.example.com and c.faerber.example.com are different domain names. It is clear that the string used in the DNS must be identical to the one produced by IMAA so that IDNA-unaware and IMAA-unaware software can handle these addresses. I wonder if anything would break if a DNS server software would encode labels containing dots according to a IDNA-compatible IMAA (and not IDNA). IDNAs are only used for domain names, which have a very restricted subset of charaters. Binary data in the DNS does not use IDNA anyway (and must bypass any ACE if it contains any octets above 0x80). Non-binary data that does make use of non-ASCII characters is currently limited to domain names, which UseSTD3ASCIIRules. Mandating an encoding different from IDNA that will produce the same output for domain names should not hurt anyone. For example, an DNS server implementation could do this: . Parse the config file, treat non-ASCII chars as opaque. . For each label found, do this: . Check whether it's binary or made up of characters (for example, it could assume that anything that contains octets above 0x80 encoded as '\OOO' is binary and everything else is character data [you could quote Unicode chars as '\x{XXXX}', for example]). . For binary data, just undo the quoting and be done. Print an error if (quoted) Unicode characters or unencoded octets above 0x80 are found. . For character data, convert from the local charset (e.g. UTF-8) to Unicode, undo the quoting of characters and encode the resulting name using the IDNA-compatible IMAA. Unless the zone file writer really means to include *invalid* domain names (that contain characters not from ['A'..'Z','a'..'Z','-'] plus the separators) as domain names encoded according to IDNA, this would work. If he really means to do that (some people just like to break things), s/he can still do the IDNA-encoding manually. Claus -- ------------------------ http://www.faerber.muc.de/ ------------------------ OpenPGP: DSS 1024/639680F0 E7A8 AADB 6C8A 2450 67EA AF68 48A5 0E63 6396 80F0 From owner-ietf-imaa Tue Feb 11 13:55:08 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1BLt8q17629 for ietf-imaa-bks; Tue, 11 Feb 2003 13:55:08 -0800 (PST) Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1BLt6d17624 for ; Tue, 11 Feb 2003 13:55:07 -0800 (PST) Received: from [209.187.148.215] (helo=p3.JCK.COM) by bs.jck.com with esmtp (Exim 4.10) id 18iiMv-00073D-00 for ietf-imaa@imc.org; Tue, 11 Feb 2003 16:55:09 -0500 Date: Tue, 11 Feb 2003 16:55:09 -0500 From: John C Klensin To: IETF IMAA list Subject: Re: Sound mapping Message-ID: <162814004.1044982509@p3.JCK.COM> In-Reply-To: <20030211051729.GG16359@nicemice.net> References: <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol. com> <20030211014413.GD16359@nicemice.net> <20030211040519.99895.qmail@cr.yp.to> <20030211051729.GG16359@nicemice.net> X-Mailer: Mulberry/3.0.0 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --On Tuesday, 11 February, 2003 05:17 +0000 "Adam M. Costello" wrote: > Because users are already accustomed to not bothering to > remember and type the proper case of letters in mail > addresses, because in practice it doesn't matter. By doing > case-folding we can avoid surprising them. Users are not > accustomed to mail addresses being sound-insensitive, so > there's no point in us working harder to meet nonexistent > expectations. And because there is a fairly large experimental and observational literature in human factors in computing (as well as some other areas) that indicates that ordinary people, unprompted, will assume that upper and lower case strings that look like names or words in most languages that use Latin-based alphabets will be treated as equivalent*. One corollary to those results (or an independent result, depending whose work one reads) is that, when people spell words out loud, they do not give indications of case unless there is some special clue that they should. By contrast, the expectation that things that sounded alike would be treated as equivalent largely disappeared with the standardization of spelling in the 19th century (at least for English -- I haven't researched other languages and would not be surprised if some standardized spellings earlier and some did so later). Of course, "simplified spelling" efforts show up every decade or two, and most of them start from the notion that identically-pronounced words should match and be spelled identically, but they have failed to gain any traction (and their advocates have mostly been dismissed as nut cases). john * Note that the statement above is fairly conservative. I haven't followed developments in the relevant literature for 15 or 20 years and have never studied any work that might exist on languages using characters that are not Latin-derived. "Ordinary people" is an informal category that does not include specialists who have learned about, and gotten used to, case distinctions. Two more disclaimers: (i) I have not seen any research that would establish whether people would expect strange-case constructions to be case-independent. E.g., those expectations about "names or words" would predict the assumption that "JOHN", "John", and "john" would be equivalent. But I don't know what assumptions would be made about the equivalence (or lack thereof) of "john" and "jOhn" and "jOhN" -- those are sufficiently strange-looking that I would be unsurprised if the reader/observer at least wondered. (ii) There could easily be some cultural differences that might cause people to suspect that case distinctions were important if their languages, e.g., capitalized all nouns (and not just "proper nouns", as in English). Seeing a noun written entirely in lower case might trigger sufficient alarms for them to turn it into a variation on the first disclaimer. I suspect there has been research on that topic, but I haven't seen it. From owner-ietf-imaa Tue Feb 11 16:57:44 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1C0vi122761 for ietf-imaa-bks; Tue, 11 Feb 2003 16:57:44 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1C0vfd22757 for ; Tue, 11 Feb 2003 16:57:43 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18ilDb-0007ZU-00 for ; Tue, 11 Feb 2003 16:57:43 -0800 Date: Wed, 12 Feb 2003 00:57:43 +0000 From: "Adam M. Costello" To: IETF IMAA list Subject: Re: Case sensitivity on the LHS Message-ID: <20030212005743.GA27754@nicemice.net> Reply-To: IETF IMAA list References: <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <3CD14E451751BD42BA48AAA50B07BAD603370662@vsvapostal3.prod.netsol.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4.2.0.58.J.20030211162348.0479ea38@localhost> <200302111947.30700@sendmail.mutz.com> <3CD14E451751BD42BA48AAA50B07BAD603370662@vsvapostal3.prod.netsol.com> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: This message responds to Scott Hollenbeck, Simon Josefsson, Marc Mutz, and Marcos Sanz/Denic. "Hollenbeck, Scott" wrote: > > > Is this document intended to be a formal update to 2821 and 2822? > > > > Is IDNA a formal update to RFCs 1034 and 1035? IMAA would bear a > > similar relation to RFCs 821, 822, 2821, 2822. > > Maybe yours is a rhetorical question since you probably know the > answer Actually, I don't know the answer, and it wasn't a rhetorical question. All I know is that the answer ought to be the same for IDNA and IMAA. I do know that IDNA does not require any implementations of RFC 1034/1035 to be changed, ever. I don't know exactly what it means for one RFC to "update" another. People interested in RFCs 1034/1035 are very likely to be interested in IDNA too. > 2821/2822 (which, by the way, obsolete 821 and 822) I hope John is correct that this is an error, because RFC 2822 neglects to mention that mail user agents need to preserve case in local parts in message headers, an important requirement that is stated in RFC 822. Simon Josefsson wrote: > Not if it makes changes to those RFCs, which seems to be (part of) > what this discussion is about. This discussion is not about making changes to the definitions of local part in RFCs 821, 822, 2821, 2822. Those RFCs define local parts as ASCII strings conforming to a certain syntax, which may be case-sensitive at the whim of the mail server for the domain where the local parts have meaning, and which therefore must have their case preserved by all other agents. IMAA would not change any of that. It would define a new concept, internationalized local parts, which unlike local parts can contain non-ASCII characters. The case-sensitivity rules for ASCII local parts would stay the same. But non-ASCII internationalized local parts already break the first rule of local parts (that they contain only ASCII characters), so there's no reason to think that they necessarily obey the other rules (that they may be case-sensitive and must be case-preserving). Of course, we would like internationalized local parts to resemble local parts as much as possible. But is it possible to use the exact same case-sensitivity rules (may be case-sensitive, must be case-preserving)? I can think of only two ways to achieve that. One way is to require all IMAA-aware mail user agents to support mixed-case annotations. I've already argued that that would have too great a cost for too little benefit. The other way is to expect existing mail servers that currently do case-insensitive ASCII comparisons of local parts (which is pretty much all of them) to upgrade to IMAA-awareness and do case-insensitive comparisons of internationalized local parts. But a key goal of IMAA, like IDNA, is not asking for upgrades of the infrustructure. We only want to ask for upgrades of end-user applications. IMAA is supposed to affect mail user agents, not mail transport agents. The mail server at yahoo.com is merely an intermediary between the sender of a message and the recipient of a message. Once the sender and recipient have upgraded their mail user agents, they shouldn't need to wait for yahoo.com to take action before they gain full use of IMAA. Therefore, I think it is undesirable for non-ASCII internationalized local parts to try to exactly duplicate the case-sensitivity rules of ASCII local parts (may be case-sensitive, must be case-preserving). Using different rules for non-ASCII internationalized local parts is not a change in existing standards, for reasons given above. That leaves at least three choices for handling case-sensitivity in non-ASCII internationalized local parts: 1) must be case-sensitive, case must be preserved 2) must be case-insensitive, case cannot be preserved 3) lowercase must be accepted, non-lowercase may be accepted (in which case it must be considered equivalent), case may be preserved Option 1 would not use case-folding. Options 2 and 3 would use case-folding. Option 3 is a new idea I have brewing, more on that later. > > You claim that "many distinct names" of humans will get collapsed by > > NFKC into the same name. So far you have provided zero examples. > > If you want to modify existing standards, I think you need to prove > that it doesn't break things, not the other way around. I don't want to modify existing standards, and I'm not breaking anything. If the use of Nameprep means that IMAA cannot distinguish two strings (like Maße and Masse), that may be unfortunate, but the existing mail standards can't distinguish them either, because they don't allow non-ASCII characters (like ß). IMAA would not cause anything to stop working that used to work, so it wouldn't "break" anything. Marc Mutz wrote: > It is of course nice to be able to use one function for the domain, as > well as the local-part, but since we already consider different field > delimiters in addition to DOT, this single-function argument probably > won't fly in practice anyway. We can't use a single function for entire domain names, or entire mail addresses, but we might be able to use the same function for domain labels and local parts (or subparts). > So we might as well define another profile. I have no objection to defining another profile if Nameprep isn't appropriate. But I think Nameprep is appropriate. I think case-folding is appropriate. > And IMO, it should _not_ include case folding. For the simple reason > that it's too aggressive. > > As an example (in addition to the ones already given by others), > consider German "Maße" (measures). The sz ligature is a character > that only exists in lower case form. I don't know what lead IDNA > to fold that to "ss", but I think that this is a bug in IDNA and > should be avoided by all means in IMAA. That's b/c Masse is German for > "mass". Please estimate the amount of annoyance that German speakers would suffer if ß matches ss, and the amount of annoyance that Turkish/Azeri/Lithuanian speakers would suffer regarding dots on i's, and weigh that against the amount of annoyance that all Latin/Cyrillic/Greek/etc users would suffer if uppercase non-ASCII letters do not match lowercase non-ASCII letters. As for why IDNA folds ß into ss, it's because that's what the Unicode case-folding operation does. We decided that trying to fix various perceived problems in Unicode's case-folding and normalization tables was a rat-hole (it would never end, someone would always find another problem in need of fixing), and was outside our scope of expertise, and it was better to simply include those Unicode-defined operations as-is. Marcos Sanz/Denic wrote: > As far as I remember, Nameprep states to be a profile for preparing > domain names and it should not be used for any other purpose. Actually, it does not state that. Here's what it states: Nameprep is used by the IDNA [IDNA] protocol for preparing domain names; it is not designed for any other purpose. It is explicitly not designed for processing arbitrary free text and SHOULD NOT be used for that purpose. That's all true. Nameprep was not designed for preparing local parts. If we design a profile for preparing local parts, and notice that it is identical to (or nearly identical to) Nameprep, then we should consider reusing Nameprep. There's no recommendation against that. The recommendation is against using Nameprep for arbitrary free text, which is not what we're doing. > Wouldn't it be a better idea to make anyway a new profile for IMAA? Maybe, maybe not. That's what we're trying to figure out now. I'm leaning toward "maybe not". AMC From owner-ietf-imaa Tue Feb 11 17:27:38 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1C1Rcb23404 for ietf-imaa-bks; Tue, 11 Feb 2003 17:27:38 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1C1Rbd23400 for ; Tue, 11 Feb 2003 17:27:37 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18ilga-0007dM-00 for ; Tue, 11 Feb 2003 17:27:40 -0800 Date: Wed, 12 Feb 2003 01:27:40 +0000 From: "Adam M. Costello" To: IETF IMAA list Subject: Re: Case sensitivity on the LHS Message-ID: <20030212012740.GB27754@nicemice.net> Reply-To: IETF IMAA list References: <20030211014413.GD16359@nicemice.net> <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <5.0.2.1.2.20030211083406.00a95b50@entree.sct1.gouv.qc.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5.0.2.1.2.20030211083406.00a95b50@entree.sct1.gouv.qc.ca> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Alain LaBonté wrote: > Could somebody summarize what is the actual behaviour of NFKC for me? NFKC normalizes the representation of strings that are "compatibly equivalent". There are two kinds of equivalence: canonical equivalence, and compatible equivalence. Any strings that are canonically equivalent are also compatibly equivalent. Typical examples of canonical equivalence: Latin small letter a with dot above Latin small letter a, combining dot above Latin small letter a, combining dot above, combining dot below Latin small letter a, combining dot below, combining dot above Kelvin sign Latin capital letter K Compatible equivalence adds equivalences between compatibility characters and other characters. Compatibility characters are generally characters that the Unicode consortium would not have included at all if it had not been necessary to support round-trip lossless conversions to/from character sets that make the distinction. Here are some typical examples (compatibility character first, followed by the regular character(s)): vulgar fraction one half digit one, fraction slash, digit two Latin small ligature ij Latin small letter i, Latin small letter j dot above space, combining dot above em space space double prime prime, prime superscript two digit two roman numeral one Latin capital letter I circled digit one digit one Arabic letter Beeh final form Arabic letter Beeh fullwidth Latin small letter a Latin small letter a AMC From owner-ietf-imaa Tue Feb 11 18:30:28 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1C2USd24655 for ietf-imaa-bks; Tue, 11 Feb 2003 18:30:28 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1C2UQd24651 for ; Tue, 11 Feb 2003 18:30:26 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18imfO-0007n5-00 for ; Tue, 11 Feb 2003 18:30:30 -0800 Date: Wed, 12 Feb 2003 02:30:30 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: Compatibility with IDNA Message-ID: <20030212023030.GA28984@nicemice.net> Reply-To: IETF IMAA list References: <20030211023401.GE16359@nicemice.net> <8fd5$Jh3cDD@3247.org> <8f$3A5e3cDD@3247.org> <20030211023401.GE16359@nicemice.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8fd5$Jh3cDD@3247.org> <20030211023401.GE16359@nicemice.net> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I wrote: > We should use the same ACE prefix [for IMAA and IDNA] if and only if > the ToASCII and ToUnicode operations are identical. Oops, that's a little stronger than we need. We should use the same ACE prefix if and only if the ToUnicode operations are identical. The ToASCII operations could be different if the difference has no effect on the observable ToUnicode behavior (remember that ToUnicode invokes ToASCII). > If two different sets of ToASCII/ToUnicode operations were to use the > same prefix, that would invite errors where a string gets encoded by > one ToASCII and decoded by the wrong ToUnicode, which would probably > cause the original string to get converted into a non-equivalent > string. Notice that this is a concern only if the ToUnicode operations differ. Claus Färber wrote: > > IDNA does not define functions for whole domain names because the > > delimiting and quoting conventions vary. For example, DNS master > > files use dots as delimiters, but DNS protocol messages don't. DNS > > master files use backslashes to quote dots that are not delimiters, > > but in DNS protocol messages there is nothing special about dots or > > backslashes. > > Well, any quoting in config files is not part of the address. The next > generation of software will take UTF-8 as input and encode the labels > automatically anyway. Okay, but how does that help me define a single function that takes entire domain names (or even entire mail addresses) as input? Consider the domain name whose first label is foo.bar and whose second and third labels are example and org. In a DNS protocol message, the domain name would contain only one dot, and no backslashes. In a DNS master file, the domain name would be a string looking like this: foo\.bar.example.org. Or actually, it might gratuitously look like this: \f\o\o\.\b\a\r.\e\x\am\p\l\e.\o\r\g. Anyway, what am I supposed to pass to this single function that takes entire domain names? It needs to be either a list of labels (in which case we have failed to factor out the parsing and left it up to the application, same as in IDNA), or it needs to be a string with some quoting mechanism for the embedded dot, in which case applications are still going to have to parse everything anyway in order to undo the various quoting styles (DNS master file, message header, SMTP, other config files) before applying the single quoting style used by this single function. If applications are doomed to parse everything anyway, we might as well stick with the model where the ACE conversions are applied independently to the individual pieces. The most we can hope for is to reuse the same conversion operations for pieces found on either side of the at-sign. > The real question is how to deal with the minimum quoting required by > RFC 2821. Is that considered part of the email address? For example, > how is that quoting handled if such an email address is included as a > DNS label? Do MTAs match the email address ``"joe user"@example.com'' > against the login name ``joe user'' or agains ``"joe user"''? I was wondering the same thing myself this morning. You can also ask the question in the other direction. If I find "foo".example.org. in an SOA record, and I want to send mail there, do I need to compose the To: field like this: "\"foo\""@example.org ? The various RFCs (1034, 1035, 822, 2822) are not clear about this. My best guess is that the RFC 822 and SMTP quotes and backslashes are not really part of the local-part, and should be removed before the local part is inserted into some other context, like a DNS master file (and therefore, if a quote character appears in the DNS master file, it really is part of the local part, and needs to be quoted in the To: field and the SMTP RCPT command). But I wouldn't rely on that guess. I'd avoid using any special characters in domain-mapped mail addresses until/unless an official clarification is published. > Well, even if this means that you have to encode the local-part and > the RHS separatly, it would still be a benefit to be able to use the > same function on both sides Agreed. I have some ideas for how to achieve that, which I'll describe in an upcoming message. > > > You can have an email address converted to a domain name (as seen > > > in SOA DNS records, for example). > > > We might be able to do that if we *don't* subdivide local parts. > > Oh, right. I missed the point that c\.faerber.example.com and > c.faerber.example.com are different domain names. > > It is clear that the string used in the DNS must be identical to the > one produced by IMAA so that IDNA-unaware and IMAA-unaware software > can handle these addresses. > > I wonder if anything would break if a DNS server software would encode > labels containing dots according to a IDNA-compatible IMAA (and not > IDNA). > > IDNAs are only used for domain names, which have a very restricted > subset of charaters. Binary data in the DNS does not use IDNA anyway > (and must bypass any ACE if it contains any octets above 0x80). > > Non-binary data that does make use of non-ASCII characters is > currently limited to domain names, which UseSTD3ASCIIRules. Mandating > an encoding different from IDNA that will produce the same output for > domain names should not hurt anyone. That's a clever idea, but you're assuming that all textual domain names use the STD-3 ASCII rules (LDH restrictions). Actually, that restricted syntax is "preferred" for domain names in general, but required only for names of hosts and mail domains. Domain names used to name things other than hosts and mail domains are not obligated to use the preferred syntax. An example is SRV names, like _ldap._tcp.example.org [RFC-2782]. Another example is classless in-addr.arpa delegations, like 0/25.2.0.192.in-addr.arpa [RFC-2317 = BCP-20]. Of course these particular examples (the only two I know of) have no need for non-ASCII characters, but I'd still be very hesitant to contemplate backtracking on IDNA's applicability by declaring "IDNA doesn't apply to all textual domain labels like we said it did, it applies only to labels conforming to the STD-3 ASCII rules". I have a hard time imagining how we could put that cat back in the bag. Getting consensus on such a change would take months (or forever), judging from the history of the IDN working group. AMC From owner-ietf-imaa Tue Feb 11 18:41:37 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1C2fb424880 for ietf-imaa-bks; Tue, 11 Feb 2003 18:41:37 -0800 (PST) Received: from pie1.i-dns.net ([203.81.44.31]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1C2fZd24873 for ; Tue, 11 Feb 2003 18:41:35 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by pie1.i-dns.net (Postfix) with ESMTP id B1992789A7 for ; Wed, 12 Feb 2003 02:41:37 +0000 (GMT) Received: from pie1.i-dns.net ([127.0.0.1]) by localhost (pie1.i-dns.net [127.0.0.1:10024]) (amavisd-new) with SMTP id 56549-06 for ; Wed, 12 Feb 2003 02:41:35 +0000 (GMT) Received: from jeffreyibm (unknown [211.104.147.95]) by pie1.i-dns.net (Postfix) with SMTP id A75317886F for ; Wed, 12 Feb 2003 02:41:30 +0000 (GMT) Message-ID: <032801c2d240$89cb09c0$fc00a8c0@jeffreyibm> From: "jeffrey" To: "'IETF IMAA list'" References: <3CD14E451751BD42BA48AAA50B07BAD603370662@vsvapostal3.prod.netsol.com> Subject: Re: Case sensitivity on the LHS Date: Wed, 12 Feb 2003 11:43:30 +0900 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 X-Virus-Scanned: by amavisd-new Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: IDNA and IMAA do not redefine the local or the domain labels of 1034/1035 & 2821/2822. They are meant to be definitions of internationalised labels that have an ascii equivalent, hence no updates needs to be made. I do a cut and paste from a previous AMC post on the idn wg: +----------------------------+ | internationalized labels | | | | +----------------+ | | | ASCII labels | | | | | | | | +--------+ | | | | | ACE | | | | | | labels | | | | | +--------+ | | | +----------------+ | +----------------------------+ > Asking a different way: would these new features (or something similar) have > been included in 1034/1035 or 2821/2822 if internationalization was > considered when the earlier specifications were being written? I'm under the impression that the result of the current idn efforts are meant to be transitory ( ? ), in keeping with the timeliness criteria of the wg. I suspect if these rfcs were written with i18n in mind, the results would be different. jeffrey ----- Original Message ----- From: "Hollenbeck, Scott" To: "'IETF IMAA list'" Sent: Tuesday, February 11, 2003 9:27 PM Subject: RE: Case sensitivity on the LHS > > > "Hollenbeck, Scott" wrote: > > > > > Is this document intended to be a formal update to 2821 and 2822? > > > > Is IDNA a formal update to RFCs 1034 and 1035? IMAA would bear a > > similar relation to RFCs 821, 822, 2821, 2822. > > Maybe yours is a rhetorical question since you probably know the answer (I > don't because IDNA hasn't yet been published as an RFC), but here's why I > asked mine: someone implementing 1034/1035 or 2821/2822 (which, by the way, > obsolete 821 and 822) might not necessarily know of the new features defined > by IDNA and IMAA since there are no "update" references maintained by the > RFC Editor. I suspect that it would probably be a good idea to have IDNA > update 1034/1035, and likewise it's probably a good idea to have IMAA update > 2821/2822 so that implementers see a clear relationship between the > specifications. > > > > Both (2821 section 4.1.2 and 2822 section 3.4.1) contain formal > > > definitions of the local part of an email address. > > > > IMAA does not redefine local part, but instead defines a new term, > > internationalized local part. Just as IDNA does not redefine domain > > label, but instead defines a new term, internationalized domain label. > > These new terms won't necessarily be known to implementers of the earlier > specifications that IDNA and IMAA build upon. If new features with new > processing rules are being defined, why not give RFC readers a clear pointer > to the specifications that describe the new features? > > Asking a different way: would these new features (or something similar) have > been included in 1034/1035 or 2821/2822 if internationalization was > considered when the earlier specifications were being written? While we'll > never know for sure, I'm suggesting that the possibility of a "yes" answer > implies that IDNA and IMAA _should_ update the earlier specifications. > > -Scott- > From owner-ietf-imaa Wed Feb 12 00:38:20 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1C8cK717926 for ietf-imaa-bks; Wed, 12 Feb 2003 00:38:20 -0800 (PST) Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1C8cId17918 for ; Wed, 12 Feb 2003 00:38:19 -0800 (PST) Received: from [209.187.148.215] (helo=p3.JCK.COM) by bs.jck.com with esmtp (Exim 4.10) id 18isPK-0008QN-00 for ietf-imaa@imc.org; Wed, 12 Feb 2003 03:38:18 -0500 Date: Wed, 12 Feb 2003 03:38:18 -0500 From: John C Klensin To: IETF IMAA list Subject: Re: Compatibility with IDNA Message-ID: <201402821.1045021098@p3.JCK.COM> In-Reply-To: <20030212023030.GA28984@nicemice.net> References: <20030211023401.GE16359@nicemice.net> <8fd5$Jh3cDD@3247.org> <8f$3A5e3cDD@3247.org> <20030211023401.GE16359@nicemice.net> <20030212023030.GA28984@nicemice.net> X-Mailer: Mulberry/3.0.0 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --On Wednesday, 12 February, 2003 02:30 +0000 "Adam M. Costello" wrote: >> The real question is how to deal with the minimum quoting >> required by RFC 2821. Is that considered part of the email >> address? For example, how is that quoting handled if such an >> email address is included as a DNS label? Do MTAs match the >> email address ``"joe user"@example.com'' against the login >> name ``joe user'' or agains ``"joe user"''? > > I was wondering the same thing myself this morning. You can > also ask the question in the other direction. If I find > > "foo".example.org. > > in an SOA record, and I want to send mail there, do I need to > compose the To: field like this: > > "\"foo\""@example.org Adam, The "only the receiving MTA gets to mess with the local-part" rule has been historically interpreted _very_ strictly and bad things have happened when it isn't. The general intent is that ''joe user'' and ''"joe user"'' be treated as equal and that ''foo'' and ''\"foo\"'' be equivalent as well, although, in the ''\"foo\"'' case, the minimal quoting rule is violated. However, the specifications very carefully avoid the assumption that a mailbox name bears any relationship to a login name. Some users, systems, and administrators find that relationships convenient. At the other extreme, some believe that having a mailbox name match the user name is an unnecessary and undesirable disclosure of information that puts important information into the hands of potential crackers and they simply won't permit it. So one answer would be that the question "which form matches the user name" is irrelevant; the only important question is "which form the receiving/delivery MTA will interprets as matching the internal mailbox (or maildrop) name". There is a second principle, which is that mailbox names, unlike most traditional DNS strings, get really close to user command-level interfaces. And command interfaces have a history of mucking up quoting conventions in a big way. Different operating systems foul up things in different ways, just to make things interesting. People who write code for the Internet email environment have discovered, after years and years of abuse of the system, a need to get really conservative about anything they actually want to have delivered. Smart email administrators tend to avoid configuring "joe user" as a mailbox name, or make sure that "joe.user", or something else that doesn't require quoting, is supported as a recommended alias. Similarly, despite the fact that the SOA record mailbox form joe\.user.some.domain is perfectly well defined as equivalent to joe.user@some.domain, folks who are more interested in making sure that the domain admin mailbox can be contacted than they are in demonstrating how much they know about the DNS usually set up names or aliases to avoid having to deal with periods in the local part. And receiving/delivery MTAs (or the associated alias mechanisms) written by people with a strong "the mail must go through if I can possibly figure out what was intended" mentality are usually configured so that joe user "joe user" joe\ user "joe\ user" and even 'joe user' and maybe even 'joe user" """"joe user" and "\"joe user" and all of their case variants, end up pointing to the same maildrop. That is either the robustness principle carried to one of its extremes or just good sense. But nothing requires that all of those cases be treated the same, any more than anything requires case-matching. Consequently, a sending/originating MUA that makes strong assumptions about how the delivery MTA is going to interpret local-parts will, at best, violate the protocols and periodically end up with undeliverable mail or, at worst, do fairly severe violence to the email environment. john From owner-ietf-imaa Wed Feb 12 01:09:01 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1C991424946 for ietf-imaa-bks; Wed, 12 Feb 2003 01:09:01 -0800 (PST) Received: from smtp.denic.de (smtp.denic.de [194.246.96.22]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1C98xd24938 for ; Wed, 12 Feb 2003 01:08:59 -0800 (PST) Received: from notes.denic.de (denics15.denic.de [194.246.96.18]) by smtp.denic.de with esmtp id 18ist0-0006Gc-00; Wed, 12 Feb 2003 10:08:59 +0100 Subject: Re: Re: Case sensitivity on the LHS To: Marc Mutz Cc: ietf-imaa@imc.org X-Mailer: Lotus Notes Release 5.0.6a January 17, 2001 Message-ID: From: "Marcos Sanz/Denic" Date: Wed, 12 Feb 2003 10:09:59 +0100 X-MIMETrack: Serialize by Router on notes/Denic(Release 5.0.11 |July 24, 2002) at 12.02.2003 10:08:59 MIME-Version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by above.proper.com id h1C990d24941 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: On 11.02.2003 19:47 Marc Mutz wrote: > > As an example (in addition to the ones already given by others), > consider German "Maße" (measures). The sz ligature is a character that > only exists in lower case form. I don't know what lead IDNA to fold > that to "ss", but I think that this is a bug in IDNA and should be > avoided by all means in IMAA. That is completely right. Maybe should we create a new profile for stringprep that overwrites the mapping tables of RFC3454? Regards, Marcos Sanz From owner-ietf-imaa Wed Feb 12 02:32:51 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CAWp505075 for ietf-imaa-bks; Wed, 12 Feb 2003 02:32:51 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1CAWod05071 for ; Wed, 12 Feb 2003 02:32:50 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18iuCB-0000Od-00 for ; Wed, 12 Feb 2003 02:32:51 -0800 Date: Wed, 12 Feb 2003 10:32:51 +0000 From: "Adam M. Costello" To: IETF IMAA list Subject: Re: Compatibility with IDNA Message-ID: <20030212103251.GC1140@nicemice.net> Reply-To: IETF IMAA list References: <20030211023401.GE16359@nicemice.net> <8fd5$Jh3cDD@3247.org> <8f$3A5e3cDD@3247.org> <20030211023401.GE16359@nicemice.net> <20030212023030.GA28984@nicemice.net> <201402821.1045021098@p3.JCK.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201402821.1045021098@p3.JCK.COM> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: John C Klensin wrote: > The "only the receiving MTA gets to mess with the local-part" > rule has been historically interpreted _very_ strictly and bad > things have happened when it isn't. The general intent is that > > ''joe user'' and ''"joe user"'' > be treated as equal and that > ''foo'' and ''\"foo\"'' > be equivalent as well, although, in the ''\"foo\"'' case, the > minimal quoting rule is violated. Okay, but I don't know how to use that principle to answer my questions. Let me ask them again: Suppose I am told to create an SOA record that will cause people to send mail like so: To: "joe:user"@example.org What is the most correct thing for me to put in the DNS master file? joe:user.example.org. "joe:user".example.org. \"joe:user\".example.org. The first two mean exactly the same thing (they cause the same DNS protocol message to be sent), so they are equally right or wrong. In the other direction, suppose I encounter an SOA record containing \"joe:user\".example.org. What is the most correct To: field I should construct to send mail there? To: "joe:user"@example.org To: "\"joe:user\""@example.org I don't dispute that it would be foolish to actually put such addresses in SOA records, I'm just trying to understand the intention of relevant standards. As far as I can tell, the issue wasn't really addressed. AMC From owner-ietf-imaa Wed Feb 12 11:23:12 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CJNCN02980 for ietf-imaa-bks; Wed, 12 Feb 2003 11:23:12 -0800 (PST) Received: from m3001.hostcentric.net (m3001.hostcentric.net [216.157.79.237]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1CJNBd02976 for ; Wed, 12 Feb 2003 11:23:11 -0800 (PST) Received: (qmail 23985 invoked by alias); 12 Feb 2003 19:22:56 -0000 Received: from unknown (HELO DAVIS1) (32.97.110.142) by 0 with SMTP; 12 Feb 2003 19:22:56 -0000 Message-ID: <002a01c2d2cc$1f565c70$6dde2b09@DAVIS1> From: "Mark Davis" To: "Marc Mutz" , , "Martin Duerst" References: <200302111947.30700@sendmail.mutz.com> <4.2.0.58.J.20030211162348.0479ea38@localhost> Subject: Re: Case sensitivity on the LHS Date: Wed, 12 Feb 2003 11:22:44 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: [using sz for the German sharp-s] But the uppercase of "Masze" is "MASSE". So for case-insensitivity, you need to treat these equivalently. If someone types in either Masze@foo.com or MASSE@foo.com, they should get the same result. However, the same is true for "Masse" and "MASSE", so that tosses Masse@foo.com into the hopper. So for it to be a well-defined equivalence relation, {ss, SS, sz} have to be in the same class. Mark ________ mark.davis@jtcsv.com IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 ----- Original Message ----- From: "Martin Duerst" To: "Mark Davis" ; "Marc Mutz" ; Sent: Tuesday, February 11, 2003 13:27 Subject: Re: Case sensitivity on the LHS > > [using sz for the German sharp-s] > > At 11:24 03/02/11 -0800, Mark Davis wrote: > > >The reason that those are case folded is that the uppercase of Masze is > >MASSE, which is the same as the uppercase of Masse. So for it to be a > >well-defined equivalence relation, {ss, SS, sz} have to be in the same class. > > Well, this is only the case if you want a full "round-trip" equivalence. > For the purpose of IDNA and IMAA, it would have been possible to define > an equivalence with two classes ({ss, SS}, {sz}) without actual problems, > because IDNA maps to lowercase. > (The only exception being for people trying to input words with sz > in all-uppercase, which they won't do anyway.) > > But given that IDNA has taken the decision it has, I don't think it's > worth to create a whole new table with all the associated confusion > for IMAA just for this little tweak. > > Regards, Martin. > From owner-ietf-imaa Wed Feb 12 11:42:29 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CJgTe03470 for ietf-imaa-bks; Wed, 12 Feb 2003 11:42:29 -0800 (PST) Received: from mailgen2.internet.gouv.qc.ca (courrier4.internet.gouv.qc.ca [192.197.162.9] (may be forged)) by above.proper.com (8.11.6/8.11.3) with SMTP id h1CJgSd03465 for ; Wed, 12 Feb 2003 11:42:28 -0800 (PST) Received: (qmail 23962 invoked from network); 12 Feb 2003 19:42:22 -0000 Received: from unknown (HELO p295.sct1.gouv.qc.ca) (142.213.85.49) by mailgen2.internet.gouv.qc.ca with SMTP; 12 Feb 2003 19:42:22 -0000 Message-Id: <5.0.2.1.2.20030212143718.00a96fa8@entree.sct1.gouv.qc.ca> X-Sender: alabonte@entree.sct1.gouv.qc.ca X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Wed, 12 Feb 2003 14:42:24 -0500 To: "Mark Davis" , "Simon Josefsson" , "IETF IMAA list" From: =?iso-8859-1?Q?Alain_LaBont=E9?= Subject: Re: Case sensitivity on the LHS Cc: In-Reply-To: <003d01c2d1df$6cedef40$7300a8c0@DAVIS1> References: <20030211014413.GD16359@nicemice.net> <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <5.0.2.1.2.20030211083406.00a95b50@entree.sct1.gouv.qc.ca> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: A 07:08 2003-02-11 -0800, Mark Davis a écrit : > > I'm not sure what kind of mapping is done by NFKC... but I suspect the > > kind of problems that may ocur. > > > > For example, if accents are removed from Latin letters once entered >... > >NFKC does not remove the accents from Latin letters. If you are going to >comment on NFKC, pro or contra, you should comment on what it does, rather >than on simple speculation as to what it does. The formal results are in UAX >#15 on the Unicode site. There is a chart that shows the effects on >characters on http://www.unicode.org/charts/normalization/. For example, if >you look at Latin characters on >http://www.unicode.org/charts/normalization/chart_Latin.html you will see >that "ô" remains as "ô" in NFKC. [Alain] Good (and thanks for the reference). But I wanted to say that even if having email addresses including accented latin letters is a "must have" (I look forward to seeing my address as Alain.LaBonté@abc.com), we need to have a way to allow those who can't enter accented letters to be able to access the same email address (on, say, a US keyboard [or a Japanese one with Romaji support], in typing Alain.LaBonte@abc.com). One way would be to have email aliases, but is it the best way? Will those ewith accented names have the burden of being sure to have aliases all the time to do this? Alain LaBonté Québec From owner-ietf-imaa Wed Feb 12 11:52:53 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CJqrl04670 for ietf-imaa-bks; Wed, 12 Feb 2003 11:52:53 -0800 (PST) Received: from mailgen2.internet.gouv.qc.ca (inet-cou2.gouv.qc.ca [192.197.162.9]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1CJqqd04665 for ; Wed, 12 Feb 2003 11:52:52 -0800 (PST) Received: (qmail 3593 invoked from network); 12 Feb 2003 19:52:46 -0000 Received: from unknown (HELO p295.sct1.gouv.qc.ca) (142.213.85.49) by mailgen2.internet.gouv.qc.ca with SMTP; 12 Feb 2003 19:52:46 -0000 Message-Id: <5.0.2.1.2.20030212144247.05326e38@entree.sct1.gouv.qc.ca> X-Sender: alabonte@entree.sct1.gouv.qc.ca X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Wed, 12 Feb 2003 14:52:49 -0500 To: "Mark Davis" , "IETF IMAA list" From: =?iso-8859-1?Q?Alain_LaBont=E9?= Subject: Re: Case sensitivity on the LHS In-Reply-To: <008a01c2d1f3$cb36f5b0$7300a8c0@DAVIS1> References: <20030211014413.GD16359@nicemice.net> <20030211025714.GF16359@nicemice.net> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: A 09:34 2003-02-11 -0800, Mark Davis a écrit : >1. From a practical point of view, I personally very much in favor of >case-insensitive names. Me too. And I would add that ideally, the behaviour in reaching an email address would be optimum with accent-insensitivity as well. A bit like searching with Google or Altavista. Unaccented-affected-Latin-letter keyword search requests will reach all accented targets. So should email addresses behave. Of course this makes a special case for the Latin alphabet, but this is a ransom of having had the Latin alphabet implemented on all keyboards of the world, but only implemented for the basic historical letters, unaccented. That situation is realistically going to last for a while. Alain LaBonté Québec From owner-ietf-imaa Wed Feb 12 11:58:58 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CJwwh04792 for ietf-imaa-bks; Wed, 12 Feb 2003 11:58:58 -0800 (PST) Received: from mailgen2.internet.gouv.qc.ca (inet-cou2.gouv.qc.ca [192.197.162.9]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1CJwvd04788 for ; Wed, 12 Feb 2003 11:58:57 -0800 (PST) Received: (qmail 443 invoked from network); 12 Feb 2003 19:58:52 -0000 Received: from unknown (HELO p295.sct1.gouv.qc.ca) (142.213.85.49) by mailgen2.internet.gouv.qc.ca with SMTP; 12 Feb 2003 19:58:52 -0000 Message-Id: <5.0.2.1.2.20030212145756.06d781c0@entree.sct1.gouv.qc.ca> X-Sender: alabonte@entree.sct1.gouv.qc.ca X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Wed, 12 Feb 2003 14:58:54 -0500 To: "Adam M. Costello" , IETF IMAA list From: =?iso-8859-1?Q?Alain_LaBont=E9?= Subject: Re: Case sensitivity on the LHS In-Reply-To: <20030212012740.GB27754@nicemice.net> References: <5.0.2.1.2.20030211083406.00a95b50@entree.sct1.gouv.qc.ca> <20030211014413.GD16359@nicemice.net> <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <5.0.2.1.2.20030211083406.00a95b50@entree.sct1.gouv.qc.ca> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: A 01:27 2003-02-12 +0000, Adam M. Costello a écrit : >Alain LaBonté wrote: > > > Could somebody summarize what is the actual behaviour of NFKC for me? > >NFKC normalizes the representation of strings that are "compatibly >equivalent". There are two kinds of equivalence: canonical equivalence, >and compatible equivalence. Any strings that are canonically equivalent >are also compatibly equivalent. [Alain] Thanks very much, I appreciate such an explanatory answer, it is what is required so that everybody understands the same thing. Alain LaBonté Québec From owner-ietf-imaa Wed Feb 12 12:09:10 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CK9AT05137 for ietf-imaa-bks; Wed, 12 Feb 2003 12:09:10 -0800 (PST) Received: from mail.uni-bielefeld.de (IDENT:72@mail2.uni-bielefeld.de [129.70.4.90]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1CK98d05133 for ; Wed, 12 Feb 2003 12:09:08 -0800 (PST) Received: from 192.168.0.17 (ppp36-228.hrz.uni-bielefeld.de [129.70.36.228]) by mail.uni-bielefeld.de (Sun Internet Mail Server sims.4.0.2000.10.12.16.25.p8) with ESMTP id <0HA700J0IPB0SH@mail.uni-bielefeld.de> for ietf-imaa@imc.org; Wed, 12 Feb 2003 21:09:07 +0100 (MET) Date: Wed, 12 Feb 2003 20:59:14 +0100 From: Marc Mutz Subject: Re: Case sensitivity on the LHS In-reply-to: <002a01c2d2cc$1f565c70$6dde2b09@DAVIS1> To: ietf-imaa@imc.org Cc: Mark Davis Message-id: <200302122059.41792@sendmail.mutz.com> Organization: KDE MIME-version: 1.0 Content-type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_teqS+CWcnNzGiNa"; charset="iso-8859-1" Content-transfer-encoding: 7bit User-Agent: KMail/1.5.9 X-PGP-Key: 0xBDBFE838 References: <4.2.0.58.J.20030211162348.0479ea38@localhost> <002a01c2d2cc$1f565c70$6dde2b09@DAVIS1> Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --Boundary-02=_teqS+CWcnNzGiNa Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Description: signed data Content-Disposition: inline On Wednesday 12 February 2003 20:22, Mark Davis wrote: > But the uppercase of "Masze" is "MASSE". So for case-insensitivity, > you need to treat these equivalently. No, not at all. If you fold to the lower-case versions of characters,=20 then SS->ss, not SS->=DF, so =DF could/should be a class of it's own. > If someone types in either > Masze@foo.com or MASSE@foo.com, they should get the same result. Nobody would enter "MASSE" if asked to enter "Ma=DFe" and =DF was available= =20 on the keyboard. Even with caps lock enabled, they'd enter 'MA=DFE". Mapping =DF->ss is like mapping =E4->ae. As "Ma=DFe" vs. "Masse" shows, =DF= is a=20 letter of it's own right in German, it just happens to have the=20 tradition of never appearing at the beginning of a word, so no-one ever=20 bothered to define what it's upper-case version should look like...=20 Also, the uppercase-is-SS trick works reasonably well, but not perfect.=20 E.g., If I write NUSS, then there's no ambiguity. It's uppercase "Nu=DF"=20 (nut). [ In this particular instance, the usage of =DF was a bug in the language=20 that was fixed with the new orthographic rules a few years back. Nuss=20 is pronounced with a short u, while Nu=DF would be pronounced with a long=20 u. So basically, with the modern orthographic rules, the only uses of =DF=20 that are left are those where "ss" would not fit the pronouncation,=20 which is another argument for keeping "ss" and =DF separate. ] > However, the same is true for "Masse" and "MASSE", so that tosses > Masse@foo.com into the hopper. So for it to be a well-defined > equivalence relation, {ss, SS, sz} have to be in the same class. That ambiguity is probably why, in Austria, some people use(d?) SZ as=20 the upper-case version of =DF (enough so that a mid-80's dictionary I=20 have here mentions it). Why was the class then not defined to be { ss, SS, =DF, sz, SZ }? Marc =2D-=20 The illegal we do immediately. The unconstitutional takes a bit longer. -- Henry Kissinger --Boundary-02=_teqS+CWcnNzGiNa Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQA+Sqet3oWD+L2/6DgRAlAsAJ0XNcdb94vDJ720S1MWq2xkfy3bNwCgkdFe AuaDWfHL3q/d/Pd4Sg3LZ1A= =mPEX -----END PGP SIGNATURE----- --Boundary-02=_teqS+CWcnNzGiNa-- From owner-ietf-imaa Wed Feb 12 12:21:23 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CKLNS05605 for ietf-imaa-bks; Wed, 12 Feb 2003 12:21:23 -0800 (PST) Received: from mail.reutershealth.com ([65.246.141.36]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1CKLLd05601 for ; Wed, 12 Feb 2003 12:21:21 -0800 (PST) Received: from skunk.reutershealth.com (mail [65.246.141.36]) by mail.reutershealth.com (Pro-8.9.3/Pro-8.9.3) with SMTP id PAA13404; Wed, 12 Feb 2003 15:18:33 -0500 (EST) Message-Id: <200302122018.PAA13404@mail.reutershealth.com> Received: by skunk.reutershealth.com (sSMTP sendmail emulation); Wed, 12 Feb 2003 15:20:56 -0500 From: John Cowan Subject: Re: Case sensitivity on the LHS To: mutz@kde.org (Marc Mutz) Date: Wed, 12 Feb 2003 15:20:56 -0500 (EST) Cc: ietf-imaa@imc.org, mark.davis@jtcsv.com (Mark Davis) In-Reply-To: <200302122059.41792@sendmail.mutz.com> from "Marc Mutz" at Feb 12, 2003 08:59:14 PM X-Mailer: ELM [version 2.5 PL6] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Marc Mutz scripsit: > That ambiguity is probably why, in Austria, some people use(d?) SZ as=20 > the upper-case version of =DF (enough so that a mid-80's dictionary I=20 > have here mentions it). Why was the class then not defined to be > { ss, SS, =DF, sz, SZ }? Indeed, when learning to type in the mid-70s, I was taught to type not only "SZ" but "sz" as well when using an English-language typewriter. -- Values of beeta will give rise to dom! John Cowan --mv, Unix 6th edition jcowan@reutershealth.com (http://cm.bell-labs.com/cm/cs/who/dmr/odd.html) From owner-ietf-imaa Wed Feb 12 12:33:29 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CKXTt05829 for ietf-imaa-bks; Wed, 12 Feb 2003 12:33:29 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1CKXRd05821 for ; Wed, 12 Feb 2003 12:33:27 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id PAA05210; Wed, 12 Feb 2003 15:33:24 -0500 Message-Id: <4.2.0.58.J.20030212143823.05074340@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Wed, 12 Feb 2003 14:50:53 -0500 To: "Mark Davis" , "Marc Mutz" , From: Martin Duerst Subject: Re: Case sensitivity on the LHS In-Reply-To: <002a01c2d2cc$1f565c70$6dde2b09@DAVIS1> References: <200302111947.30700@sendmail.mutz.com> <4.2.0.58.J.20030211162348.0479ea38@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: At 11:22 03/02/12 -0800, Mark Davis wrote: >[using sz for the German sharp-s] > >But the uppercase of "Masze" is "MASSE". So for case-insensitivity, you need >to treat these equivalently. If someone types in either Masze@foo.com or >MASSE@foo.com, they should get the same result. However, the same is true >for "Masse" and "MASSE", so that tosses Masse@foo.com into the hopper. So >for it to be a well-defined equivalence relation, {ss, SS, sz} have to be in >the same class. One could use a very similar argument for getting rid of the accents in French. The upper case of "e'te'" is (for many purposes) "ETE". So that would then put {e, e', E, E'} all in the same class. [There are of course some differences, such as that the upper case E' actually exists and is sometimes used.] The question, from an user perspective, is "Is it more important to have MASSE be the same as masze, or is it more important to have masse and masze being different?". The answer to this question, in the case of things such as domain names (mostly lower case, false positives not allowed), would be "the later". The answer is different in a typical search-engine context (false positives allowed). Applying upper-case and lower-case operations until one gets a result that doesn't change anymore is not the only way to obtain (well-defined) equivalence classes. And stringprep/nameprep doesn't actually need equivalence classes, it only needs a 'toLower' operation. What is most important is to make sure that user expectations within the application at hand are met as well as possible. Anyway, because IDNA is now defined the way it is, I don't think it is worth doing something different for IMAA. For German users to have to learn that they can use sz on the left hand side, but not on the right hand is not really a good solution. Regards, Martin. From owner-ietf-imaa Wed Feb 12 12:33:26 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CKXQU05817 for ietf-imaa-bks; Wed, 12 Feb 2003 12:33:26 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1CKXOd05812 for ; Wed, 12 Feb 2003 12:33:25 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id PAA05218; Wed, 12 Feb 2003 15:33:25 -0500 Message-Id: <4.2.0.58.J.20030212145155.05a0dd50@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Wed, 12 Feb 2003 14:55:08 -0500 To: Alain LaBonte , "IETF IMAA list" From: Martin Duerst Subject: Re: Case sensitivity on the LHS Cc: In-Reply-To: <5.0.2.1.2.20030212143718.00a96fa8@entree.sct1.gouv.qc.ca> References: <003d01c2d1df$6cedef40$7300a8c0@DAVIS1> <20030211014413.GD16359@nicemice.net> <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <5.0.2.1.2.20030211083406.00a95b50@entree.sct1.gouv.qc.ca> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: At 14:42 03/02/12 -0500, Alain LaBont wrote: >[Alain] Good (and thanks for the reference). But I wanted to say that >even if having email addresses including accented latin letters is a "must >have" (I look forward to seeing my address as Alain.LaBonte'@abc.com), we >need to have a way to allow those who can't enter accented letters to be >able to access the same email address (on, say, a US keyboard [or a >Japanese one with Romaji support], in typing Alain.LaBonte@abc.com). > >One way would be to have email aliases, but is it the best way? Will those >ewith accented names have the burden of being sure to have aliases all the >time to do this? I would have to say, unfortunately, yes. ASCII-only equivalents are not something that can be generated mechanically. For some languages, that may be possible, but not for others (in particular languages not written with the Latin script), and not for the collection of all languages together. Regards, Martin. From owner-ietf-imaa Wed Feb 12 12:48:46 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CKmkB06081 for ietf-imaa-bks; Wed, 12 Feb 2003 12:48:46 -0800 (PST) Received: from relay-1m.club-internet.fr (relay-1m.club-internet.fr [194.158.104.40]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1CKmjd06077 for ; Wed, 12 Feb 2003 12:48:45 -0800 (PST) Received: from mine.club-internet.fr (f06a-4-36.d1.club-internet.fr [212.194.123.36]) by relay-1m.club-internet.fr (Postfix) with ESMTP id 083811755 for ; Wed, 12 Feb 2003 21:49:04 +0100 (CET) Message-Id: <5.2.0.9.0.20030212215301.02cc21b0@mail.club-internet.fr> X-Sender: jefsey@mail.club-internet.fr X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Wed, 12 Feb 2003 21:54:29 +0100 To: "IETF IMAA list" From: "J-F C. (Jefsey) Morfin" Subject: Re: Case sensitivity on the LHS In-Reply-To: <5.0.2.1.2.20030212144247.05326e38@entree.sct1.gouv.qc.ca> References: <008a01c2d1f3$cb36f5b0$7300a8c0@DAVIS1> <20030211014413.GD16359@nicemice.net> <20030211025714.GF16359@nicemice.net> Mime-Version: 1.0 Content-Type: multipart/mixed; x-avg-checked=avg-ok-51C62536; boundary="=======592A1CA4=======" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --=======592A1CA4======= Content-Type: text/plain; x-avg-checked=avg-ok-51C62536; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 8bit At 20:52 12/02/03, Alain LaBonté wrote: >A 09:34 2003-02-11 -0800, Mark Davis a écrit : >>1. From a practical point of view, I personally very much in favor of >>case-insensitive names. >Me too. >And I would add that ideally, the behaviour in reaching an email address >would be optimum with accent-insensitivity as well. >A bit like searching with Google or Altavista. >Unaccented-affected-Latin-letter keyword search requests will reach all >accented targets. So should email addresses behave. From reading all the mails in here it appears that - from a user point of view and seemingly to respect the RFCs - [optional] case sensitivity is a need. If we put ourselves in a user perspective we will want a consistent support of the whole URL on the left of the "@" and on the right of the "?". Case insensitivity can only be used when the receiving end has accepted it (the case of small devices as I noted before). Otherwise it seems there is a real need for sentivity. We are here considering only *existing* cases, but if languages are using upper and lower cases it is to extend their possiblities. A *new* development should not *reduce* the existing possiblities even if they are not *yet* used. The more the people will use the network, the more they will want what they have everywhere else. --=======592A1CA4======= Content-Type: text/plain; charset=us-ascii; x-avg=cert; x-avg-checked=avg-ok-51C62536 Content-Disposition: inline --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.449 / Virus Database: 251 - Release Date: 27/01/03 --=======592A1CA4=======-- From owner-ietf-imaa Wed Feb 12 13:37:38 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CLbcS07211 for ietf-imaa-bks; Wed, 12 Feb 2003 13:37:38 -0800 (PST) Received: from m3001.hostcentric.net (m3001.hostcentric.net [216.157.79.237]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1CLbad07205 for ; Wed, 12 Feb 2003 13:37:37 -0800 (PST) Received: (qmail 609 invoked by alias); 12 Feb 2003 21:37:40 -0000 Received: from unknown (HELO DAVIS1) (32.97.110.142) by 0 with SMTP; 12 Feb 2003 21:37:40 -0000 Message-ID: <00b901c2d2de$f1472720$6dde2b09@DAVIS1> From: "Mark Davis" To: "Marc Mutz" , Cc: References: <4.2.0.58.J.20030211162348.0479ea38@localhost> <002a01c2d2cc$1f565c70$6dde2b09@DAVIS1> <200302122059.41792@sendmail.mutz.com> Subject: Re: Case sensitivity on the LHS Date: Wed, 12 Feb 2003 13:37:27 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: > No, not at all. If you fold to the lower-case versions of characters, > then SS->ss, not SS->ß, so ß could/should be a class of it's own. But the *purpose* of case folding is to do a caseless match; in other words, the caseless matching drives the folding operation. You look at what a caseless match would produce, then drive the case folding from that. > Mapping ß->ss is like mapping ä->ae. Now it is unfortunate that there is no uppercase version of ess-zed; sometimes you see the character preserved in all uppercase words, more typically it is converted to SS. But it is not like ä vs ae; you would not normally see uppercased words converted so that the umlauts became E. > That ambiguity is probably why, in Austria, some people use(d?) SZ This would not be normal German orthography. Mark =================== Marc Munz wrote: On Wednesday 12 February 2003 20:22, Mark Davis wrote: > But the uppercase of "Masze" is "MASSE". So for case-insensitivity, > you need to treat these equivalently. No, not at all. If you fold to the lower-case versions of characters, then SS->ss, not SS->ß, so ß could/should be a class of it's own. > If someone types in either > Masze@foo.com or MASSE@foo.com, they should get the same result. Nobody would enter "MASSE" if asked to enter "Maße" and ß was available on the keyboard. Even with caps lock enabled, they'd enter 'MAßE". Mapping ß->ss is like mapping ä->ae. As "Maße" vs. "Masse" shows, ß is a letter of it's own right in German, it just happens to have the tradition of never appearing at the beginning of a word, so no-one ever bothered to define what it's upper-case version should look like... Also, the uppercase-is-SS trick works reasonably well, but not perfect. E.g., If I write NUSS, then there's no ambiguity. It's uppercase "Nuß" (nut). [ In this particular instance, the usage of ß was a bug in the language that was fixed with the new orthographic rules a few years back. Nuss is pronounced with a short u, while Nuß would be pronounced with a long u. So basically, with the modern orthographic rules, the only uses of ß that are left are those where "ss" would not fit the pronouncation, which is another argument for keeping "ss" and ß separate. ] > However, the same is true for "Masse" and "MASSE", so that tosses > Masse@foo.com into the hopper. So for it to be a well-defined > equivalence relation, {ss, SS, sz} have to be in the same class. That ambiguity is probably why, in Austria, some people use(d?) SZ as the upper-case version of ß (enough so that a mid-80's dictionary I have here mentions it). Why was the class then not defined to be { ss, SS, ß, sz, SZ }? Marc -- The illegal we do immediately. The unconstitutional takes a bit longer. -- Henry Kissinger Mark ________ mark.davis@jtcsv.com IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 ----- Original Message ----- From: "Marc Mutz" To: Cc: "Mark Davis" Sent: Wednesday, February 12, 2003 11:59 Subject: Re: Case sensitivity on the LHS From owner-ietf-imaa Wed Feb 12 15:09:11 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1CN9Be09877 for ietf-imaa-bks; Wed, 12 Feb 2003 15:09:11 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1CN9Ad09872 for ; Wed, 12 Feb 2003 15:09:10 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18j609-000252-00 for ; Wed, 12 Feb 2003 15:09:13 -0800 Date: Wed, 12 Feb 2003 23:09:13 +0000 From: "Adam M. Costello" To: IETF IMAA list Subject: Re: Case sensitivity on the LHS Message-ID: <20030212230913.GA7477@nicemice.net> Reply-To: IETF IMAA list References: <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <5.0.2.1.2.20030211083406.00a95b50@entree.sct1.gouv.qc.ca> <5.0.2.1.2.20030212143718.00a96fa8@entree.sct1.gouv.qc.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4.2.0.58.J.20030212145155.05a0dd50@localhost> <4.2.0.58.J.20030212143823.05074340@localhost> <5.0.2.1.2.20030212144247.05326e38@entree.sct1.gouv.qc.ca> <5.0.2.1.2.20030212143718.00a96fa8@entree.sct1.gouv.qc.ca> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Alain LaBonté wrote: > even if having email addresses including accented latin letters > is a "must have" (I look forward to seeing my address as > Alain.LaBonté@abc.com), we need to have a way to allow those who can't > enter accented letters to be able to access the same email address > (on, say, a US keyboard [or a Japanese one with Romaji support], in > typing Alain.LaBonte@abc.com). > > One way would be to have email aliases, but is it the best way? There are two ways. Either use an alias Alain.LaBonte --> Alain.LaBonté, or tell the sender the ACE form, which would look something like xn##Alain.LaBont-meb (of course we have yet to work out the details). > ideally, the behaviour in reaching an email address would be optimum > with accent-insensitivity as well. > > A bit like searching with Google or Altavista. Email addresses are not a directory or a search service. They are identifiers. We had the same discussion for IDNs. If you want a domain that's easy and inuitive to access for both accent-enabled users and accent-challenged users, you'll need to have two domains (one of which could be an alias for the other). The same is true of email addresses. Martin Duerst wrote: > stringprep/nameprep doesn't actually need equivalence classes, it only > needs a 'toLower' operation. Well, UTR#21 says "Caseless matching is implemented using case-folding", and we wanted to do case-insensitive comparisons, which we assumed is a synonym for "caseless matching", so we used case-folding. > Anyway, because IDNA is now defined the way it is, I don't think it is > worth doing something different for IMAA. For German users to have to > learn that they can use sz on the left hand side, but not on the right > hand is not really a good solution. Definitely. AMC From owner-ietf-imaa Wed Feb 12 17:49:39 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1D1ndt13456 for ietf-imaa-bks; Wed, 12 Feb 2003 17:49:39 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1D1nbd13452 for ; Wed, 12 Feb 2003 17:49:38 -0800 (PST) Received: (qmail 26059 invoked by uid 66); 13 Feb 2003 01:49:36 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 13 Feb 2003 01:49:36 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-0123d); 13 Feb 2003 02:49:23 +0100 Date: 13 Feb 2003 01:45:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8flEF$rJcDD@3247.org> In-Reply-To: <4.2.0.58.J.20030212145155.05a0dd50@localhost> Subject: Re: Case sensitivity on the LHS User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-0123d MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Martin Duerst schrieb/wrote: > I would have to say, unfortunately, yes. ASCII-only equivalents are > not something that can be generated mechanically. For some languages, > that may be possible, but not for others (in particular languages > not written with the Latin script), and not for the collection of > all languages together. Which means... let the user register multiple names/aliases if they want to have two versions point to them. Claus -- ------------------------ http://www.faerber.muc.de/ ------------------------ OpenPGP: DSS 1024/639680F0 E7A8 AADB 6C8A 2450 67EA AF68 48A5 0E63 6396 80F0 From owner-ietf-imaa Wed Feb 12 18:03:22 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1D23Ma13701 for ietf-imaa-bks; Wed, 12 Feb 2003 18:03:22 -0800 (PST) Received: from mailgen2.internet.gouv.qc.ca (courrier4.internet.gouv.qc.ca [192.197.162.9] (may be forged)) by above.proper.com (8.11.6/8.11.3) with SMTP id h1D23Ld13697 for ; Wed, 12 Feb 2003 18:03:21 -0800 (PST) Received: (qmail 13652 invoked from network); 13 Feb 2003 02:03:16 -0000 Received: from unknown (HELO p295.sct1.gouv.qc.ca) (142.213.85.49) by mailgen2.internet.gouv.qc.ca with SMTP; 13 Feb 2003 02:03:16 -0000 Message-Id: <5.0.2.1.2.20030212210030.00a97200@entree.sct1.gouv.qc.ca> X-Sender: alabonte@entree.sct1.gouv.qc.ca X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Wed, 12 Feb 2003 21:03:19 -0500 To: list-ietf-i18n-imaa@faerber.muc.de (Claus Färber), ietf-imaa@imc.org From: =?iso-8859-1?Q?Alain_LaBont=E9?= Subject: Re: Case sensitivity on the LHS In-Reply-To: <8flEF$rJcDD@3247.org> References: <4.2.0.58.J.20030212145155.05a0dd50@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: A 01:45 2003-02-13 +0100, Claus Färber a écrit : >Martin Duerst schrieb/wrote: > > I would have to say, unfortunately, yes. ASCII-only equivalents are > > not something that can be generated mechanically. For some languages, > > that may be possible, but not for others (in particular languages > > not written with the Latin script), and not for the collection of > > all languages together. > >Which means... let the user register multiple names/aliases if they want >to have two versions point to them. [Alain] Fair enough. That's at least a reasonable way to solve the problem, if it can't be done mechanically in a universal way. This automaton could be localized at the ISP's level for a language group, but that is of course not relevant at this point, I guess... Alain LaBonté Québec From owner-ietf-imaa Wed Feb 12 18:56:16 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1D2uGq14890 for ietf-imaa-bks; Wed, 12 Feb 2003 18:56:16 -0800 (PST) Received: from [63.202.92.149] (adsl-63-202-92-149.dsl.snfc21.pacbell.net [63.202.92.149]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1D2uEd14884 for ; Wed, 12 Feb 2003 18:56:14 -0800 (PST) Mime-Version: 1.0 X-Sender: phoffman@mail.imc.org Message-Id: X-Habeas-SWE-1: winter into spring X-Habeas-SWE-2: brightly anticipated X-Habeas-SWE-3: like Habeas SWE (tm) X-Habeas-SWE-4: Copyright 2002 Habeas (tm) X-Habeas-SWE-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-SWE-6: email in exchange for a license for this Habeas X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this X-Habeas-SWE-9: mark in spam to . Date: Wed, 12 Feb 2003 18:56:26 -0800 To: ietf-imaa@imc.org From: Paul Hoffman / IMC Subject: There are other open topics, folks... Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Adam and I will digest the thread on case-sensitivity and make some changes in the -01 draft based on it. There are a lot of other open issues; please pick your favorite and start a thread on it! --Paul Hoffman, Director --Internet Mail Consortium From owner-ietf-imaa Wed Feb 12 19:08:03 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1D383g15219 for ietf-imaa-bks; Wed, 12 Feb 2003 19:08:03 -0800 (PST) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1D382d15215 for ; Wed, 12 Feb 2003 19:08:02 -0800 (PST) Received: (qmail 44294 invoked by uid 1016); 13 Feb 2003 03:08:31 -0000 Date: 13 Feb 2003 03:08:31 -0000 Message-ID: <20030213030831.44293.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: ietf-imaa@imc.org Subject: The typing issue References: <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <20030211040519.99895.qmail@cr.yp.to> <001c01c2d195$b1370320$f57812ac@camus> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Maynard Kang writes: > To require compulsory 14755 input for e-mail addresses is plain > ridiculous, if you ask me. Keyboard interfaces have to support ISO 14755. For users, ISO 14755 is simply an extra option---sometimes the only option that works. Let me put it this way. Someone gives you a business card. The card has an email address. The email address has (say) Japanese characters that you've never seen before. How do you type those characters? Answer: The card shows you, on the next line, what to type, thanks to a universal keyboard standard for Unicode, namely ISO 14755. Done. The only alternative proposal I've seen is forcing every international user to set up a second email address---an ASCII address. Why waste all that effort to work around the typing issue, imposing extra costs on billions of users, when we can simply have keyboard interfaces support a perfectly straightforward standard that allows everything to be typed? ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-ietf-imaa Wed Feb 12 19:48:42 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1D3mg216276 for ietf-imaa-bks; Wed, 12 Feb 2003 19:48:42 -0800 (PST) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1D3mfd16272 for ; Wed, 12 Feb 2003 19:48:41 -0800 (PST) Received: (qmail 50739 invoked by uid 1016); 13 Feb 2003 03:49:11 -0000 Date: 13 Feb 2003 03:49:10 -0000 Message-ID: <20030213034910.50738.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: ietf-imaa@imc.org Subject: Re: Case sensitivity on the LHS References: <20030210142105.GG12186@bodin.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: tedd writes: > making the LHS case-sensitive Mailbox names _are_ case-sensitive. Any message sender that tries to convert mailbox names to lowercase will break interoperability. > Thus, I believe that whatever method is adapted for character > consideration should be consistent throughout the address. Sorry, but the IDNA proponents repeatedly refused to think beyond domain names. They said that mailbox names (and login names and so on) were ``out of scope.'' The IDNA proponents also refused to take the conservative approach of prohibiting uppercase---which allows uppercase to be safely added later if it's necessary. They insisted on case-insensitivity---which becomes an irrevocable decision if users start relying on uppercase addresses. The IDNA proponents also didn't stop and think ``Gee, maybe we're doing something wrong'' when they received public objections from more than THREE HUNDRED PEOPLE. They simply went ahead and declared ``consensus.'' ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-ietf-imaa Wed Feb 12 20:00:27 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1D40R916473 for ietf-imaa-bks; Wed, 12 Feb 2003 20:00:27 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1D40Qd16469 for ; Wed, 12 Feb 2003 20:00:26 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jAY1-0002wq-00 for ; Wed, 12 Feb 2003 20:00:29 -0800 Date: Thu, 13 Feb 2003 04:00:29 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: Compatibility with IDNA Message-ID: <20030213040028.GA9630@nicemice.net> Reply-To: IETF IMAA list References: <8f$3A5e3cDD@3247.org> <20030211023401.GE16359@nicemice.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030211023401.GE16359@nicemice.net> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I wrote: > At this point, the most [reuse of IDNA] we could try for is to use > the exact same encoding for local-parts (or subparts) as is used for > domain labels. Let's explore this idea and see where it leads. As I argued earlier, IDNA and IMAA can use the same ACE prefix only if they use the exact same ToUnicode operation. ToUnicode invokes ToASCII, so it might appear that we would therefore also need the exact same ToASCII operation. However, the only thing ToUnicode does with the result of ToASCII is perform a case-insensitive ASCII comparison on it. Therefore, the IMAA ToASCII and the IDNA ToASCII can differ, provided that the differences are confined to the case of letters in the output. IDNA leaves a lot of implementation freedom regarding the case of the letters output from ToASCII, because the output of ToASCII is an ASCII domain label, which is case-insensitive. Therefore IDNA & Punycode do not specify whether Punycode uses uppercase or lowercase letters to encode the deltas, and IDNA does not specify whether ToASCII prepends a lowercase prefix or an uppercase prefix (or a mixed-case prefix). The main difficulty with reusing ToUnicode in IMAA is that it accepts both lowercase ACEs and uppercase ACEs (and mixed-case ACEs). For example, if xn--blahblah is an ACE local part, then ToUnicode will convert xn--blahblah and XN--BLAHBLAH into the exact same Unicode string. But if we then apply ToASCII to those two identical Unicode strings, we obviously get two identical ASCII strings. So the two local parts xn--blahblah and XN--BLAHBLAH, which "may" refer to distinct mailboxes according to the standards, have been collapsed into a single local part after a round trip through ToUnicode and ToASCII. Not good! ToUnicode and ToASCII need to be lossless. I can think of one way out of this trap: Impose a new administrative requirement that non-lowercase ACE local parts must not be created unless they refer to the same mailbox as the corresponding all-lowercase ACE local part. So if xn--blahblah exists at all, the all-lowercase form is the one guaranteed to work. All other capitalizations are either equivalent to xn--blahblah, or don't exist. This new requirement is automatically obeyed by all mail servers that do case-insensitive ASCII comparisons on local parts (which is virtually all of them in practice). For case-sensitive mail servers, administrators will need to avoid creating non-lowercase ACE local parts. There is theoretically a chance that there exists a local part today that violates the new requirement. But it would have to be a valid ACE (which is very rare), and be non-lowercase (which is atypical), and be served by a case-sensitive mail server (which is very rare). I would be extremely surprised if the intersection of those sets is not empty. Getting back to the round-trip problem, we're not done solving it yet. For a case-sensitive mail server, it might be that xn--blahblah exists, but all other capitalizations don't exist. Therefore users need to be able to reliably type something that ToASCII will convert to xn--blahblah and not some other capitalization. The most obvious modification of ToASCII that would accomplish this would be to have Punycode always use lowercase letters for encoding deltas, and have ToASCII always use the lowercase form of the ACE prefix when prepending it. That's a perfectly reasonable way to implement it, but it would be overkill to specify such a strong constraint. All we really need is: If the input of ToASCII contains no uppercase characters, then the output of ToASCII must contain no uppercase characters. The simple implementation (always use lowercase when you have a choice) obviously satisfies the constraint, but the door is also left open for more complex optional behavior (like mixed-case annotations for preserving case). If a user types a non-ASCII local part without using uppercase characters, it will definitely work; if the user capitalizes any of the characters, it will still work if the mail server performs case-insensitive ASCII comparisons on local parts. That's the same situation as today: If a user types an ASCII local part in the original correct case, it will definitely work; if the user types the local part using some other capitalization, it will still work if the mail server performs case-insensitive ASCII comparisons on local parts. That's it. The open issue of whether to subdivide local parts is orthogonal. Whatever pieces we obtain before the at-sign (a whole local part or subparts of it), we can use the very same ToASCII and ToUnicode implementation that we use for domain labels, including the same ACE prefix and the same profile. (But we can't necessarily use any off-the-shelf IDNA implementation; we need to make sure ToASCII satisifies IMAA's additional constraint.) Here's a summary of what this would mean for case sensitivity: For domains whose mail server is already case-insensitive for ASCII local parts, non-ASCII local parts would likewise be case-insensitive, automatically. For domains whose mail server is case-sensitive for ASCII local parts, it is possible for two ASCII case-variants to refer to different mailboxes, but this would not be possible for non-ASCII case variants. Only the all-lowercase version could exist. The non-lowercase non-ASCII variants would either work by accident or bounce, depending on the exact implementation of the sender's ToASCII. Whereas most mail user agents preserve case for ASCII local parts, most probably would not preserve case for non-ASCII local parts, because mixed-case annotations require considerable additional effort. But it would be possible. If you know that your own mail address is case-insensitive, then you can use mixed-case annotations in your outgoing From: fields, and recipients whose mail user agents make the extra effort will display it in mixed-case form. You know they can reply because you know your address is case-insensitive. AMC From owner-ietf-imaa Wed Feb 12 21:49:56 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1D5nua18752 for ietf-imaa-bks; Wed, 12 Feb 2003 21:49:56 -0800 (PST) Received: from exchange.ad.skymv.com (66-120-210-136.ded.pacbell.net [66.120.210.136]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1D5ntd18745 for ; Wed, 12 Feb 2003 21:49:55 -0800 (PST) Received: from exchange.ad.skymv.com ([192.168.1.71]) by exchange.ad.skymv.com with Microsoft SMTPSVC(5.0.2195.5329); Wed, 12 Feb 2003 21:49:41 -0800 content-class: urn:content-classes:message Subject: RE: The typing issue MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Wed, 12 Feb 2003 21:49:41 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Message-ID: <138AA78F80DCE84B8EE424399FFBF9C904FAA1@exchange.ad.skymv.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: The typing issue Thread-Index: AcLTDVPpjzxKc7xkRPeVQU+sL7WzqAAFB+Pg From: "Dan Kohn" To: X-OriginalArrivalTime: 13 Feb 2003 05:49:41.0765 (UTC) FILETIME=[B409C350:01C2D323] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by above.proper.com id h1D5ntd18746 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: D. J. Bernstein wrote: > Let me put it this way. Someone gives you a business card. The card > has an email address. The email address has (say) Japanese characters > that you've never seen before. How do you type those characters? > Answer: The card shows you, on the next line, what to type, thanks to > a universal keyboard standard for Unicode, namely ISO 14755. Done. > The only alternative proposal I've seen is forcing every international > user to set up a second email address---an ASCII address. Why waste > all that effort to work around the typing issue, imposing extra costs > on billions of users, when we can simply have keyboard interfaces > support a perfectly straightforward standard that allows everything > to be typed? There is an alternative to registering an ASCII domain for each IDN: instead, you can print the punycode on the business card below the IMAA/IDN email address. Compared to ISO 14755 [1], it seems to me that punycode is more universal (it works wherever ASCII is available), more compact (it supports LDH rather than hex), and no more ugly than ISO 14755. Take an email address on a business card of @example.com (where the bracketed characters would be shown as kanji). Not knowing Japanese, I'd rather see (and type) xn--d9juau41awczczp@example.com than u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067@example.com where for each u+ I need to hold down ctrl-alt. Of course, we both agree that they could also set up sonosupiidode@example.com to forward to the same mailbox. Of course, I know how much Dan hates punycode and so he'll hate this suggestion. [1] http://www-rocq.inria.fr/qui/Philippe.Deschamp/divers/ALB-CD.html - dan -- Dan Kohn From owner-ietf-imaa Thu Feb 13 00:08:00 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1D880W03379 for ietf-imaa-bks; Thu, 13 Feb 2003 00:08:00 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1D87xd03375 for ; Thu, 13 Feb 2003 00:07:59 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jEPX-0004mI-00 for ; Thu, 13 Feb 2003 00:07:59 -0800 Date: Thu, 13 Feb 2003 08:07:59 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: Compatibility with IDNA Message-ID: <20030213080759.GA18181@nicemice.net> Reply-To: IETF IMAA list References: <8f$3A5e3cDD@3247.org> <20030211023401.GE16359@nicemice.net> <20030213040028.GA9630@nicemice.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030213040028.GA9630@nicemice.net> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I wrote: > If the input of ToASCII contains no uppercase characters, then the > output of ToASCII must contain no uppercase characters. Actually, IMAA would need to impose this constraint on both ToASCII and ToUnicode. The ToUnicode operation, as written in the IDNA spec, automatically satisfies the constraint, whereas ToASCII, as written, has some flexibility that IMAA would need to limit. But just as it's okay for various ToASCII implementations to output slightly different strings, provided they are all equivalent, it would also be harmless for various ToUnicode implementations to output slightly different strings, provided they are all equivalent. (IDNA defines equivalence between X and Y as: ToASCII(X) matches ToASCII(Y) using a case-insensitive ASCII comparison.) It is this flexibility in both ToASCII and ToUnicode that makes possible case preservation via mixed-case annotations. So in order to guarantee that the lowercase form of internationalized local parts always works and survives round-trips through ToASCII and ToUnicode, we need to impose the above constraint on both operations. Note that I'm not suggesting that we mention mixed-case annotations or case preservation in the IMAA spec. IDNA doesn't mention it, so IMAA probably won't either. As long is it's possible, I'm content. AMC From owner-ietf-imaa Thu Feb 13 00:52:37 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1D8qba11707 for ietf-imaa-bks; Thu, 13 Feb 2003 00:52:37 -0800 (PST) Received: from leonis.nus.edu.sg (leonis.nus.edu.sg [137.132.1.18]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1D8qUd11664 for ; Thu, 13 Feb 2003 00:52:31 -0800 (PST) Received: from bic.nus.edu.sg (12-49.priv.nus.edu.sg [172.18.12.49]) by leonis.nus.edu.sg (8.12.1/8.12.1) with ESMTP id h1D8rRwE015282; Thu, 13 Feb 2003 16:53:33 +0800 (SGT) Message-ID: <3E4B5CAD.9090304@bic.nus.edu.sg> Date: Thu, 13 Feb 2003 16:51:57 +0800 From: Tan Tin Wee User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Dan Kohn CC: ietf-imaa@imc.org Subject: Re: The typing issue References: <138AA78F80DCE84B8EE424399FFBF9C904FAA1@exchange.ad.skymv.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I always thought that a Japanese name card had Japanese characters on one side for those who can read and write Japanese, and English (Romaji) characters on the other side of the card, for those who can't read or write Japanese. So in the same vein, the IDN user, say Japanese, has his/her Japanese domain name or email address, not for the sake of the non-Japanese reader/writer. If it were intended for this guy, then the IDN user would have sent the email using the ASCII character address. This means that anyone not knowing Japanese is not expected to be able to type or read Japanese on his/her computer, and if they sent an email address to you in Japanese, chances are that the content included Japanese characters which you can't read either. So the issue of whether you or I prefer to type "xn--d9juau41awczczp@example.com than u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067@example.com " is a near non-issue except in the following kind of circumstance, for instance, say, "I know both Japanese and Romaji, and this Japanese (who can handle English too) guy wrote email to me in Japanese, which I typically reply to on my Japanese-enabled notebook, but I was flying into San Francisco, and had to reply to him using my yahoo account (which is by then IDN'ised), and the computer at the airport business center, doesn't have the keyboard input system for Japanese, I had to figure out the punycode xn-- version of his email address in order to reply to him urgently, and cannot possible have the time to hold down the Control Alt key to do the U+ thing." The issue IMHO is also not "forcing every international user to set up a second email address". The point is that if every IDN user is also an international person, he would have had his namecards printed one side in say, Japanese, for Japanese meetings, and the other side in English, for international meetings which he is having. In the same way, he would naturally have both an ASCII email address and the Japanese email address. And in his common usage, he will be using his Japanese language email address when mailing stuff in Japanese to his Japanese friends and if any of the stuff leaks out to the International friends, he may well include his ASCII email address, and if not, he's not expecting you to respond anyway. In fact, if all of us English speaking folks put ourselves in the same shoes as the IDN guy, we might be asking the opposite. "I would rather type email addresses in Japanese characters rather than type ASCII because it is not natural and I find it rather difficult to recognise the terribly confusing ASCII characters on the keyboard simply so that I can write a completely Japanese email to my friend who is also Japanese down the next block. So the Internet stuff of sending email in ASCII, an alien character set is pretty broken for me." So take home message here is that all of us should be aware that we are arguing from the world view of the English-enabled person. The whole purpose of having IDNs and IDN email addresses, in my opinion, is for the sake of the un-ASCII'ed masses in the world who take a long time, (as long as you take to key in funny IDN characters or U+whatever characters), to use the Web or the Email to read stuff in their own language. So long as they're ok with it, I'm ok with it. To them, U+whatever, or xn-whateverpunycode, or even tinwee@pobox.org.sg are all just as bad for them, as much as U+whatever, or xn-whatever or @example.com is equally bad for me as a non-Japanese user. Sorry to take so long to put across a small point, but I am not a true native English user. -- tin wee Dan Kohn wrote: >D. J. Bernstein wrote: > > > >>Let me put it this way. Someone gives you a business card. The card >>has an email address. The email address has (say) Japanese characters >>that you've never seen before. How do you type those characters? >> >> >>Answer: The card shows you, on the next line, what to type, thanks to >>a universal keyboard standard for Unicode, namely ISO 14755. Done. >> >> >>The only alternative proposal I've seen is forcing every international >>user to set up a second email address---an ASCII address. Why waste >>all that effort to work around the typing issue, imposing extra costs >>on billions of users, when we can simply have keyboard interfaces >>support a perfectly straightforward standard that allows everything >>to be typed? >> >> > >There is an alternative to registering an ASCII domain for each IDN: >instead, you can print the punycode on the business card below the >IMAA/IDN email address. > >Compared to ISO 14755 [1], it seems to me that punycode is more >universal (it works wherever ASCII is available), more compact (it >supports LDH rather than hex), and no more ugly than ISO 14755. > >Take an email address on a business card of >@example.com (where the bracketed characters would be >shown as kanji). > >Not knowing Japanese, I'd rather see (and type) >xn--d9juau41awczczp@example.com than u+305D u+306E u+30B9 u+30D4 u+30FC >u+30C9 u+3067@example.com where for each u+ I need to hold down >ctrl-alt. Of course, we both agree that they could also set up >sonosupiidode@example.com to forward to the same mailbox. > >Of course, I know how much Dan hates punycode and so he'll hate this >suggestion. > >[1] http://www-rocq.inria.fr/qui/Philippe.Deschamp/divers/ALB-CD.html > > > - dan >-- >Dan Kohn > > > > > From owner-ietf-imaa Thu Feb 13 03:30:02 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DBU2o29007 for ietf-imaa-bks; Thu, 13 Feb 2003 03:30:02 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DBU1d29003 for ; Thu, 13 Feb 2003 03:30:01 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1DBUDKA015006 for ; Thu, 13 Feb 2003 11:30:14 GMT To: ietf-imaa@imc.org Subject: Re: Case sensitivity on the LHS Date: Tue, 11 Feb 2003 22:53:04 +0000 From: Roy Badami Message-ID: <1045135813.15005.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: What is the current user expectation? My guess is that case-insensitive is more widespread. In any case, one or the other expectation will be disappointed (if they ever happen to notice). Do we have any idea which systems are more numerous (the only sample I have at the moment is my own email address, which is case-insensitive). In terms of traditional Unix MTAs, sendmail is case insensitive by default, and this would have been a good starting point to answer the question five or ten years ago. If you want to know what's the norm know, you have to ask the questions: Are hotmail addresses case sensitive? Are e-mail addresses on Exchange servers case sensitive (by default). I'm pretty sure the answer to both these questions are that they're case-insensitive. There's no doubt that case-sensitive local parts are unusual; the important question is whether they exist to any significant degree at all... -roy From owner-ietf-imaa Thu Feb 13 03:29:09 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DBT9028976 for ietf-imaa-bks; Thu, 13 Feb 2003 03:29:09 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DBT7d28972 for ; Thu, 13 Feb 2003 03:29:07 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1DBTKKA014977 for ; Thu, 13 Feb 2003 11:29:20 GMT To: ietf-imaa@imc.org Subject: Re: Case sensitivity on the LHS Date: Tue, 11 Feb 2003 22:25:13 +0000 From: Roy Badami Message-ID: <1045135760.14976.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I have never encountered a case-sensitive local-part. Has anyone here ever encountered a case-sensitive local-part? I have never encountered one myself, though I have heard rumours of their existence. I think it's important to be aware of the fact that Internet e-mail is often gatewayed into non-Internet systems such as UUCP or FidoNET in parts of the world where IP connectivity is not commonplace. Is anyone familiar with the state of current deployment of modern UUCP and FidoNET networks, and in a position to comment on the case issues that arise there? -roy From owner-ietf-imaa Thu Feb 13 03:28:12 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DBSCF28929 for ietf-imaa-bks; Thu, 13 Feb 2003 03:28:12 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DBSAd28925 for ; Thu, 13 Feb 2003 03:28:10 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1DBSMKA014972 for ; Thu, 13 Feb 2003 11:28:23 GMT To: ietf-imaa@imc.org Subject: Re: John Cowan on IMAA draft Date: Tue, 11 Feb 2003 22:10:53 +0000 From: Roy Badami Message-ID: <1045135702.14971.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Recognizing fullwidth @ is important, because it's context dependent whether people are using halfwidth or fullwidth characters, and they may not even be conscious of it in double-width environments. Seconded. It's not unusual to see our Japanese customers accidentally put full width characters into their English language e-mail messages to the company I work for. Disallowing full-width-at will almost certainly cause confusion. The argument for consistency with the IDNA approach to full-width-dot is also a strong one. -roy From owner-ietf-imaa Thu Feb 13 03:29:34 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DBTYu28990 for ietf-imaa-bks; Thu, 13 Feb 2003 03:29:34 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DBTVd28986 for ; Thu, 13 Feb 2003 03:29:32 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1DBTiKA014982 for ; Thu, 13 Feb 2003 11:29:44 GMT To: ietf-imaa@imc.org Subject: Re: Case sensitivity on the LHS Date: Tue, 11 Feb 2003 22:42:21 +0000 From: Roy Badami Message-ID: <1045135784.14981.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Lithuanian "I" with an accent above lowercases to "I" + DOT ABOVE + the main accent, because (unlike all other "i"s with accents) the i keeps its dot. For Unicode case-folding purposes, this discrepancy is ignored. It's a font issue :) Evidently in a Lithuanian font, the glyph for 'dotless I' incorporates a dot... -roy From owner-ietf-imaa Thu Feb 13 03:27:47 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DBRlq28912 for ietf-imaa-bks; Thu, 13 Feb 2003 03:27:47 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DBRjd28908 for ; Thu, 13 Feb 2003 03:27:45 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1DBRlKA014954 for ; Thu, 13 Feb 2003 11:27:55 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: A couple of comments on the open issues... Date: Tue, 11 Feb 2003 21:55:45 +0000 From: Roy Badami Message-ID: <1045135666.14953.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: [I'm not a subscriber to the list, so it would be helpful to cc me on any reply, though I do intend to review the archives periodically] By way of background, I should say that I became interested in IDNs late on in the process. I followed much of the later part of the process on the idn list via the archives, but never really felt the need to post, since my opinions tended to be well represented by many other long-standing members of the WG. However, I hope you won't mind my voicing my opinions on a couple of the open issues raised in the IMAA document. I shan't get into the obvious big issue of case (in)sensitivity, since this is not an issue I have strong opinions on, and I have confidence that this forum (and any subsequent WG) will do something sensible in that respect. But a couple of other issues in the document seemed worthy of comment: Rather than transform the entire local part as a single unit, another approach is to pick out smaller pieces of the local part, and transform each piece independently, analogous to the way labels are picked out of a domain name and transformed independently. The tradeoff is complexity versus compatibility with various unofficial conventions for structured local parts, like owner-listname, user+tag, sublocal.local, path!user, etc. I'm particular keen that the use of a tag or suffix with a local username in not broken by IMAA. Many MTAs provide the functionality that all mail addressed to will be delivered to , either by default, or as a configuration option. (Delimiter is to my knowledge typically either '+' or '-', though it's possible that there are others in use). This effectively allows a user of such an MTA (that has been suitably configureed) to have multiple e-mail addresses without requiring any action on the part of the mail administrator, and allows the user to run scripts that process their mail according to the suffix. There are several software packages available to users of Unix and Unix-like machines that make use of this functionality, and I think it is important that users of IMAs (if that's the right term) are not excluded or seriously inconvenienced in the use of such packages. Secondly: Should we consider using punctuation other than hyphens in the ACE prefix? Then we could use the same letters as IDNA. For example, if the IDNA ACE prefix were bq--, the IMAA ACE prefix could be bq== or bq## I'm not going to voice an opinion on the specific question posed, but I would like to counsel against using either the equals sign or the hash sign for this purpose. In particular, I would urge the list to ensure that as far as possible local parts generated by any IMAA specification adhere to a far more conservative character set than that mandated by RFC822/2822, namely the set of characters that has historically been commonplace in local parts. I would personally regard this as being alphanumerics, period, hyphen and underscore. It is unfortunately still the case that there are systems in widespread use which have difficulty in accomodating RFC822 addresses that contain valid but unusual characters. In general, I suspect that this is most likely to be the case when a local non-RFC822 mail system is gatewaying into the RFC822 world. For instance, I am aware that Lotus Notes systems running release 4 have problems sending mail to RFC822 addresses containing plus signs, at least in the default configuration. Whilst such restrictions in Internet e-mail addressing are clearly undesirable, and whilst some such gateways may have mechanisms for escaping unusual characters in order to represent them within local constraints, it remains a fact of life that systems such as these are currently deployed on the Internet in significant numbers, and I think it is desirable that IMAA does not exacerbate the deficiencies of such legacy systems to the point of rendering them incapable of communicating with users of IMAs. A second argument for not straying outside the traditional character set I define above is the danger that there may be MTAs in use which ascribe special meaning to unusual characters in a way which is not easily configurable (if it is configurable at all). We all know that there are deployed systems that ascribe special meaning to the characters '%' and '!', despite the fact that these characters have no special meaning in RFC822. There may be systems out there that ascribe special meaning to other unusual characters, too, and use of these characters may make adoption of IMAA problematic for sites which depend on such systems. These are just my initial thoughts on a couple of the issues that seemed important. -roy From owner-ietf-imaa Thu Feb 13 03:30:50 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DBUow29038 for ietf-imaa-bks; Thu, 13 Feb 2003 03:30:50 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DBUmd29034 for ; Thu, 13 Feb 2003 03:30:48 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1DBV1KA015011 for ; Thu, 13 Feb 2003 11:31:01 GMT Date: Tue, 11 Feb 2003 23:59:08 +0000 to: ietf-imaa@imc.org Subject: Re: A couple of comments on the open issues... From: Roy Badami Message-ID: <1045135861.15010.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: A second argument for not straying outside the traditional character set I define above is the danger that there may be MTAs in use which ascribe special meaning to unusual characters in a way which is not easily configurable (if it is configurable at all). We all know that there are deployed systems that ascribe special meaning to the characters '%' and '!', despite the fact that these characters have no special meaning in RFC822. There may be systems out there that ascribe special meaning to other unusual characters, too, and use of these characters may make adoption of IMAA problematic for sites which depend on such systems. I'd like to reflect further on this particular thought of mine, if I may... The main issue here, I think, is that there may be sites that wish to allow the creation of internationalized mailboxes without being in a position to upgrade to IMAA-compliant software. In some cases, it may be adequate simply to create mailboxes within the legacy system corresponding to the ACE-encoded local part. Now, we don't know in general what restrictions these legacy systems might place on mailbox names. We know there are systems that will not permit '%' or '!' or many other punctuation characters in mailbox names, but for all we know there may be systems that don't permit dot, hyphen, or underscore in local names. There may even be systems that don't permit digits in mailbox names. There are almost certainly still systems on the Internet that don't allow the creation of mailbox names longer than eight (or even six) characters -- clearly they will have major difficulties if they attempt to support the creation of internationalized mailboxes, and there's nothing we can do about that. So in the end even my defined set of alphanumerics, dot, hyphen and underscore is somewhat arbitrary. Although all these characters have been historically commonplace in local parts, that's not to say that all systems have historically allowed the creation of mailboxes containing these characters. There's little we can do to help those systems that don't allow the creation of arbitrary mailbox names, but restricting the character set to a conservative one can only help. (The other argument in my previous post is probably the more important one, though. There are legacy systems connected to the Internet that are incapable of sending e-mail to addresses containing certain unusual punctuation characters, so we should be careful to avoid using those characters in our ACE encoding.) -roy From owner-ietf-imaa Thu Feb 13 04:28:10 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DCSA329937 for ietf-imaa-bks; Thu, 13 Feb 2003 04:28:10 -0800 (PST) Received: from mercury.ccil.org (mail@[192.190.237.100]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DCS9d29933 for ; Thu, 13 Feb 2003 04:28:09 -0800 (PST) Received: from cowan by mercury.ccil.org with local (Exim 3.35 #1 (Debian)) id 18jIRz-0007hy-00; Thu, 13 Feb 2003 07:26:47 -0500 Subject: Re: Case sensitivity on the LHS In-Reply-To: <1045135760.14976.TMDA@moriarty.gnomon.org.uk> from Roy Badami at "Feb 11, 2003 10:25:13 pm" To: Roy Badami Date: Thu, 13 Feb 2003 07:26:47 -0500 (EST) CC: ietf-imaa@imc.org X-Mailer: ELM [version 2.4ME+ PL66 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: From: John Cowan Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami scripsit: > I think it's important to be aware of the fact that Internet e-mail is > often gatewayed into non-Internet systems such as UUCP or FidoNET in > parts of the world where IP connectivity is not commonplace. UUCP has case-sensitive host names, but the local-part is subject to the same ambiguous rule as Internet local-parts: it's case-insensitive iff the recipient MUA decides it is. Fidonet names are case-insensitive. -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_ From owner-ietf-imaa Thu Feb 13 05:06:02 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DD62901360 for ietf-imaa-bks; Thu, 13 Feb 2003 05:06:02 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1DD60d01356 for ; Thu, 13 Feb 2003 05:06:00 -0800 (PST) Received: (qmail 27071 invoked by uid 66); 13 Feb 2003 13:05:59 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 13 Feb 2003 13:05:59 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-1304d); 13 Feb 2003 14:05:53 +0100 Date: 13 Feb 2003 14:01:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8flF09g3cDD@3247.org> In-Reply-To: <5.0.2.1.2.20030212143718.00a96fa8@entree.sct1.gouv.qc.ca> Subject: Re: Case sensitivity on the LHS User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-1304d MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Alain LaBonté schrieb/wrote: > [Alain] Good (and thanks for the reference). But I wanted to say that even > if having email addresses including accented latin letters is a "must have" > (I look forward to seeing my address as Alain.LaBonté@abc.com), we need to > have a way to allow those who can't enter accented letters to be able to > access the same email address (on, say, a US keyboard [or a Japanese one > with Romaji support], in typing Alain.LaBonte@abc.com). They can view/enter the address as Alain.xn--LaBont-gva@abc.com (for example). Claus -- http://www.faerber.muc.de/ From owner-ietf-imaa Thu Feb 13 05:19:11 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DDJBs02582 for ietf-imaa-bks; Thu, 13 Feb 2003 05:19:11 -0800 (PST) Received: from mailgen2.internet.gouv.qc.ca (inet-cou2.gouv.qc.ca [192.197.162.9]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1DDJAd02575 for ; Thu, 13 Feb 2003 05:19:10 -0800 (PST) Received: (qmail 6535 invoked from network); 13 Feb 2003 13:19:02 -0000 Received: from unknown (HELO p295.sct1.gouv.qc.ca) (142.213.85.49) by mailgen2.internet.gouv.qc.ca with SMTP; 13 Feb 2003 13:19:02 -0000 Message-Id: <5.0.2.1.2.20030213080513.00a97200@entree.sct1.gouv.qc.ca> X-Sender: alabonte@entree.sct1.gouv.qc.ca X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Thu, 13 Feb 2003 08:19:08 -0500 To: "D. J. Bernstein" , ietf-imaa@imc.org From: =?iso-8859-1?Q?Alain_LaBont=E9?= Subject: Re: The typing issue In-Reply-To: <20030213030831.44293.qmail@cr.yp.to> References: <8f$3A$+JcDD@3247.org> <3CD14E451751BD42BA48AAA50B07BAD60337064E@vsvapostal3.prod.netsol.com> <20030211014413.GD16359@nicemice.net> <20030211040519.99895.qmail@cr.yp.to> <001c01c2d195$b1370320$f57812ac@camus> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: A 03:08 2003-02-13 +0000, D. J. Bernstein a écrit : >Maynard Kang writes: > > To require compulsory 14755 input for e-mail addresses is plain > > ridiculous, if you ask me. [DJB] >Keyboard interfaces have to support ISO 14755. For users, ISO 14755 is >simply an extra option---sometimes the only option that works. > >Let me put it this way. Someone gives you a business card. The card has >an email address. The email address has (say) Japanese characters that >you've never seen before. How do you type those characters? > >Answer: The card shows you, on the next line, what to type, thanks to a >universal keyboard standard for Unicode, namely ISO 14755. Done. > >The only alternative proposal I've seen is forcing every international >user to set up a second email address---an ASCII address. Why waste all >that effort to work around the typing issue, imposing extra costs on >billions of users, when we can simply have keyboard interfaces support a >perfectly straightforward standard that allows everything to be typed? > >---D. J. Bernstein, Associate Professor, Department of Mathematics, >Statistics, and Computer Science, University of Illinois at Chicago [ALB] As the instigator and project editor of ISO/IEC 14755, I am but pleased with this promotion of that standard that ought to be on any computer which has a keyboard. However there is a trap with some characters. A lot of similar glyphs refer to different characters. With Latin letters and Unicode normalization as a case in point one can probably get out of this trap most of the times (unless someone, say, uses a Greek upper case Alpha letter or a Cyrillic letter A instead of Latin A, although one must expect that this reasonably won't happen). But with some other scripts there might be an issue with this multi-character-one-glyph problem (not talking about the issue of compatibility characters which could be normalized too). ISO/IEC 14755 has a feedback option that allows to know exactly what UCS id a glyph on the screen corresponds to, for future entry. But on a business card, this feedback does not exist. I don't know a fool-proof solution to this problem, except for a yet-to-be-developed intelligent interface, dedicated to glyph-character correspondence display, after optical reading (coupled with an ISO/IEC 14755 implementation). That said, I agree that ISO/IEC 14755 should be on every computer, that would allow people to get out of trouble for rare but familiar characters which people occasionally have to enter. Alain LaBonté Project editor, ISO/IEC 14755 (developed by ISO/IEC JTC1/SC35/WG1) Québec From owner-ietf-imaa Thu Feb 13 08:16:17 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DGGHC11736 for ietf-imaa-bks; Thu, 13 Feb 2003 08:16:17 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DGGFd11732 for ; Thu, 13 Feb 2003 08:16:15 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id LAA14526; Thu, 13 Feb 2003 11:15:42 -0500 Message-Id: <4.2.0.58.J.20030213085558.05a4b9c8@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Thu, 13 Feb 2003 11:15:14 -0500 To: Tan Tin Wee , Dan Kohn From: Martin Duerst Subject: Re: The typing issue Cc: ietf-imaa@imc.org In-Reply-To: <3E4B5CAD.9090304@bic.nus.edu.sg> References: <138AA78F80DCE84B8EE424399FFBF9C904FAA1@exchange.ad.skymv.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I fully agree with Tan Tin Wee. To summarize, the following options have been proposed: 1) Print ISO 14755 for your email address on your namecard 2) Print punicode for your email address on your namecard 3) Print an ASCII-only email address on your namecard 4) Do nothing We don't have to do anything on this issue here, we can just leave it to the user. However, for the record, I predict here that we will mostly see 3), and there will be lot's of 4) (but you probably won't see that very much). 1) and 2) will only be used very rarely. Regards, Martin. At 16:51 03/02/13 +0800, Tan Tin Wee wrote: >I always thought that a Japanese name card had >Japanese characters on one side for those who can read and write Japanese, >and English (Romaji) characters on the other side of the card, >for those who can't read or write Japanese. > >So in the same vein, the IDN user, say Japanese, has his/her Japanese >domain name or email address, not for the sake of the non-Japanese >reader/writer. If it were intended for this guy, >then the IDN user would have sent the email using >the ASCII character address. > >This means that anyone not knowing Japanese is not expected >to be able to type or read Japanese on his/her computer, and >if they sent an email address to you in Japanese, chances are >that the content included Japanese characters which you can't >read either. > >So the issue of whether you or I prefer to type > >"xn--d9juau41awczczp@example.com than u+305D u+306E u+30B9 u+30D4 u+30FC >u+30C9 u+3067@example.com " > >is a near non-issue except in the following kind of circumstance, >for instance, say, > >"I know both Japanese and Romaji, and this Japanese (who can handle >English too) guy wrote email to me in Japanese, which I typically reply to >on my Japanese-enabled notebook, but I was flying into San Francisco, and >had to reply to him using my yahoo account (which is by then IDN'ised), >and the computer at the airport business center, doesn't have the keyboard >input system for Japanese, I had to figure out the punycode xn-- version >of his email address in order to reply to him >urgently, and cannot possible have the time to hold down >the Control Alt key to do the U+ thing." > >The issue IMHO is also not "forcing every international user >to set up a second email address". The point is that >if every IDN user is also an international person, he would >have had his namecards printed one side in say, Japanese, >for Japanese meetings, and the other side in English, >for international meetings which he is having. In the >same way, he would naturally have both an ASCII email >address and the Japanese email address. > >And in his common usage, he will be using his Japanese >language email address when mailing stuff in Japanese >to his Japanese friends and if any of the stuff leaks >out to the International friends, he may well >include his ASCII email address, and if not, he's not expecting you to >respond anyway. >In fact, if all of us English speaking folks put >ourselves in the same shoes as the IDN guy, we might be asking the opposite. >"I would rather type email addresses in Japanese characters rather than >type ASCII because it is not natural and I find >it rather difficult to recognise the terribly confusing >ASCII characters on the keyboard simply so that I can >write a completely Japanese email to my friend >who is also Japanese down the next block. So the >Internet stuff of sending email in ASCII, an alien >character set is pretty broken for me." > >So take home message here is that all of us should be aware that >we are arguing from the world view of the English-enabled person. >The whole purpose of having IDNs and IDN email addresses, in >my opinion, is for the sake of the un-ASCII'ed masses in the world >who take a long time, (as long as you take to key in funny IDN >characters or U+whatever characters), to use the Web or the >Email to read stuff in their own language. So long as they're >ok with it, I'm ok with it. To them, U+whatever, or >xn-whateverpunycode, or even tinwee@pobox.org.sg >are all just as bad for them, as much as U+whatever, or >xn-whatever or @example.com >is equally bad for me as a non-Japanese user. > >Sorry to take so long to put across a small point, >but I am not a true native English user. > >-- >tin wee > > > >Dan Kohn wrote: > >>D. J. Bernstein wrote: >> >> >> >>>Let me put it this way. Someone gives you a business card. The card >>>has an email address. The email address has (say) Japanese characters >>>that you've never seen before. How do you type those characters? >>> >>> >>>Answer: The card shows you, on the next line, what to type, thanks to >>>a universal keyboard standard for Unicode, namely ISO 14755. Done. >>> >>> >>>The only alternative proposal I've seen is forcing every international >>>user to set up a second email address---an ASCII address. Why waste >>>all that effort to work around the typing issue, imposing extra costs >>>on billions of users, when we can simply have keyboard interfaces >>>support a perfectly straightforward standard that allows everything >>>to be typed? >> >>There is an alternative to registering an ASCII domain for each IDN: >>instead, you can print the punycode on the business card below the >>IMAA/IDN email address. >> >>Compared to ISO 14755 [1], it seems to me that punycode is more >>universal (it works wherever ASCII is available), more compact (it >>supports LDH rather than hex), and no more ugly than ISO 14755. >> >>Take an email address on a business card of >>@example.com (where the bracketed characters would be >>shown as kanji). >> >>Not knowing Japanese, I'd rather see (and type) >>xn--d9juau41awczczp@example.com than u+305D u+306E u+30B9 u+30D4 u+30FC >>u+30C9 u+3067@example.com where for each u+ I need to hold down >>ctrl-alt. Of course, we both agree that they could also set up >>sonosupiidode@example.com to forward to the same mailbox. >> >>Of course, I know how much Dan hates punycode and so he'll hate this >>suggestion. >> >>[1] http://www-rocq.inria.fr/qui/Philippe.Deschamp/divers/ALB-CD.html >> >> >> - dan >>-- >>Dan Kohn >> >> >> >> From owner-ietf-imaa Thu Feb 13 09:29:00 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DHT0V16853 for ietf-imaa-bks; Thu, 13 Feb 2003 09:29:00 -0800 (PST) Received: from relay-2m.club-internet.fr (relay-2m.club-internet.fr [194.158.104.41]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DHSrd16833 for ; Thu, 13 Feb 2003 09:28:53 -0800 (PST) Received: from mine.club-internet.fr (f10v-8-217.d1.club-internet.fr [213.44.235.217]) by relay-2m.club-internet.fr (Postfix) with ESMTP id CF44D169C for ; Thu, 13 Feb 2003 18:28:44 +0100 (CET) Message-Id: <5.2.0.9.0.20030213181053.040e2a00@mail.club-internet.fr> X-Sender: jefsey@mail.club-internet.fr X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Thu, 13 Feb 2003 18:25:04 +0100 To: ietf-imaa@imc.org From: "J-F C. (Jefsey) Morfin" Subject: Re: The typing issue In-Reply-To: <4.2.0.58.J.20030213085558.05a4b9c8@localhost> References: <3E4B5CAD.9090304@bic.nus.edu.sg> <138AA78F80DCE84B8EE424399FFBF9C904FAA1@exchange.ad.skymv.com> Mime-Version: 1.0 Content-Type: multipart/mixed; x-avg-checked=avg-ok-6CE14C30; boundary="=======68DF64EB=======" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --=======68DF64EB======= Content-Type: text/plain; x-avg-checked=avg-ok-6CE14C30; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 8bit An WSIS discussion of this issue, on cultural and societal grounds, shows a clear demand that: To: M.le.Président@République.fr if sent as such, is received and read as such by the destinee. The rationale is: as long as the character set used is a technically limited, politeness accepts to be limited by technical limitations. When the character set becomes the standard usual character set, standard usual social conventions must resume. jfc At 17:15 13/02/03, Martin Duerst wrote: >I fully agree with Tan Tin Wee. To summarize, the following options >have been proposed: > >1) Print ISO 14755 for your email address on your namecard >2) Print punicode for your email address on your namecard >3) Print an ASCII-only email address on your namecard >4) Do nothing > >We don't have to do anything on this issue here, we can just leave >it to the user. However, for the record, I predict here that we >will mostly see 3), and there will be lot's of 4) (but you >probably won't see that very much). 1) and 2) will only be used >very rarely. > >Regards, Martin. > > >At 16:51 03/02/13 +0800, Tan Tin Wee wrote: > >>I always thought that a Japanese name card had >>Japanese characters on one side for those who can read and write Japanese, >>and English (Romaji) characters on the other side of the card, >>for those who can't read or write Japanese. >> >>So in the same vein, the IDN user, say Japanese, has his/her Japanese >>domain name or email address, not for the sake of the non-Japanese >>reader/writer. If it were intended for this guy, >>then the IDN user would have sent the email using >>the ASCII character address. >> >>This means that anyone not knowing Japanese is not expected >>to be able to type or read Japanese on his/her computer, and >>if they sent an email address to you in Japanese, chances are >>that the content included Japanese characters which you can't >>read either. >> >>So the issue of whether you or I prefer to type >> >>"xn--d9juau41awczczp@example.com than u+305D u+306E u+30B9 u+30D4 u+30FC >>u+30C9 u+3067@example.com " >> >>is a near non-issue except in the following kind of circumstance, >>for instance, say, >> >>"I know both Japanese and Romaji, and this Japanese (who can handle >>English too) guy wrote email to me in Japanese, which I typically reply >>to on my Japanese-enabled notebook, but I was flying into San Francisco, >>and had to reply to him using my yahoo account (which is by then >>IDN'ised), and the computer at the airport business center, doesn't have >>the keyboard input system for Japanese, I had to figure out the punycode >>xn-- version of his email address in order to reply to him >>urgently, and cannot possible have the time to hold down >>the Control Alt key to do the U+ thing." >> >>The issue IMHO is also not "forcing every international user >>to set up a second email address". The point is that >>if every IDN user is also an international person, he would >>have had his namecards printed one side in say, Japanese, >>for Japanese meetings, and the other side in English, >>for international meetings which he is having. In the >>same way, he would naturally have both an ASCII email >>address and the Japanese email address. >> >>And in his common usage, he will be using his Japanese >>language email address when mailing stuff in Japanese >>to his Japanese friends and if any of the stuff leaks >>out to the International friends, he may well >>include his ASCII email address, and if not, he's not expecting you to >>respond anyway. >>In fact, if all of us English speaking folks put >>ourselves in the same shoes as the IDN guy, we might be asking the opposite. >>"I would rather type email addresses in Japanese characters rather than >>type ASCII because it is not natural and I find >>it rather difficult to recognise the terribly confusing >>ASCII characters on the keyboard simply so that I can >>write a completely Japanese email to my friend >>who is also Japanese down the next block. So the >>Internet stuff of sending email in ASCII, an alien >>character set is pretty broken for me." >> >>So take home message here is that all of us should be aware that >>we are arguing from the world view of the English-enabled person. >>The whole purpose of having IDNs and IDN email addresses, in >>my opinion, is for the sake of the un-ASCII'ed masses in the world >>who take a long time, (as long as you take to key in funny IDN >>characters or U+whatever characters), to use the Web or the >>Email to read stuff in their own language. So long as they're >>ok with it, I'm ok with it. To them, U+whatever, or >>xn-whateverpunycode, or even tinwee@pobox.org.sg >>are all just as bad for them, as much as U+whatever, or >>xn-whatever or @example.com >>is equally bad for me as a non-Japanese user. >> >>Sorry to take so long to put across a small point, >>but I am not a true native English user. >> >>-- >>tin wee >> >> >> >>Dan Kohn wrote: >> >>>D. J. Bernstein wrote: >>> >>> >>> >>>>Let me put it this way. Someone gives you a business card. The card >>>>has an email address. The email address has (say) Japanese characters >>>>that you've never seen before. How do you type those characters? >>>> >>>> >>>>Answer: The card shows you, on the next line, what to type, thanks to >>>>a universal keyboard standard for Unicode, namely ISO 14755. Done. >>>> >>>> >>>>The only alternative proposal I've seen is forcing every international >>>>user to set up a second email address---an ASCII address. Why waste >>>>all that effort to work around the typing issue, imposing extra costs >>>>on billions of users, when we can simply have keyboard interfaces >>>>support a perfectly straightforward standard that allows everything >>>>to be typed? >>> >>>There is an alternative to registering an ASCII domain for each IDN: >>>instead, you can print the punycode on the business card below the >>>IMAA/IDN email address. >>> >>>Compared to ISO 14755 [1], it seems to me that punycode is more >>>universal (it works wherever ASCII is available), more compact (it >>>supports LDH rather than hex), and no more ugly than ISO 14755. >>> >>>Take an email address on a business card of >>>@example.com (where the bracketed characters would be >>>shown as kanji). >>> >>>Not knowing Japanese, I'd rather see (and type) >>>xn--d9juau41awczczp@example.com than u+305D u+306E u+30B9 u+30D4 u+30FC >>>u+30C9 u+3067@example.com where for each u+ I need to hold down >>>ctrl-alt. Of course, we both agree that they could also set up >>>sonosupiidode@example.com to forward to the same mailbox. >>> >>>Of course, I know how much Dan hates punycode and so he'll hate this >>>suggestion. >>> >>>[1] http://www-rocq.inria.fr/qui/Philippe.Deschamp/divers/ALB-CD.html >>> >>> >>> - dan >>>-- >>>Dan Kohn >>> >>> > > > > > >--- >Incoming mail is certified Virus Free. >Checked by AVG anti-virus system (http://www.grisoft.com). >Version: 6.0.454 / Virus Database: 253 - Release Date: 10/02/03 --=======68DF64EB======= Content-Type: text/plain; charset=us-ascii; x-avg=cert; x-avg-checked=avg-ok-6CE14C30 Content-Disposition: inline --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.454 / Virus Database: 253 - Release Date: 10/02/03 --=======68DF64EB=======-- From owner-ietf-imaa Thu Feb 13 09:40:38 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DHecH18225 for ietf-imaa-bks; Thu, 13 Feb 2003 09:40:38 -0800 (PST) Received: from relay-5v.club-internet.fr (relay-5v.club-internet.fr [194.158.96.110]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DHebd18220 for ; Thu, 13 Feb 2003 09:40:37 -0800 (PST) Received: from mine.club-internet.fr (f10v-8-217.d1.club-internet.fr [213.44.235.217]) by relay-5v.club-internet.fr (Postfix) with ESMTP id 6605B1771 for ; Thu, 13 Feb 2003 18:40:57 +0100 (CET) Message-Id: <5.2.0.9.0.20030213182505.040e5570@pop.online.fr> X-Sender: jefsey@mail.club-internet.fr X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Thu, 13 Feb 2003 18:32:12 +0100 To: ietf-imaa@imc.org From: "J-F C. (Jefsey) Morfin" Subject: how should I address this? Mime-Version: 1.0 Content-Type: multipart/mixed; x-avg-checked=avg-ok-6CE14C30; boundary="=======353C7706=======" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --=======353C7706======= Content-Type: text/plain; x-avg-checked=avg-ok-6CE14C30; charset=us-ascii; format=flowed Content-Transfer-Encoding: 8bit I have been explained why case-sentivity is necessary to a worldwide business project that could have a real magnitude. This is obviously key IP for its developpers. They claim they are entitled to the respect of existing RFCs (I am not competent enough to judge of that). They also understand the concern of this WG and are ready to disclose their application under NDA to who ever is the final decision maker. How should I address this? jfc --=======353C7706======= Content-Type: text/plain; charset=us-ascii; x-avg=cert; x-avg-checked=avg-ok-6CE14C30 Content-Disposition: inline --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.454 / Virus Database: 253 - Release Date: 10/02/03 --=======353C7706=======-- From owner-ietf-imaa Thu Feb 13 09:58:53 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DHwr618860 for ietf-imaa-bks; Thu, 13 Feb 2003 09:58:53 -0800 (PST) Received: from [63.202.92.157] (adsl-63-202-92-157.dsl.snfc21.pacbell.net [63.202.92.157]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DHwod18853; Thu, 13 Feb 2003 09:58:50 -0800 (PST) Mime-Version: 1.0 X-Sender: phoffman@mail.imc.org Message-Id: In-Reply-To: <5.2.0.9.0.20030213182505.040e5570@pop.online.fr> References: <5.2.0.9.0.20030213182505.040e5570@pop.online.fr> X-Habeas-SWE-1: winter into spring X-Habeas-SWE-2: brightly anticipated X-Habeas-SWE-3: like Habeas SWE (tm) X-Habeas-SWE-4: Copyright 2002 Habeas (tm) X-Habeas-SWE-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-SWE-6: email in exchange for a license for this Habeas X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this X-Habeas-SWE-9: mark in spam to . Date: Thu, 13 Feb 2003 09:58:49 -0800 To: "J-F C. (Jefsey) Morfin" , ietf-imaa@imc.org From: Paul Hoffman / IMC Subject: Re: how should I address this? Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: At 6:32 PM +0100 2/13/03, J-F C. (Jefsey) Morfin wrote: >They also understand the concern of this WG and are ready to >disclose their application under NDA to who ever is the final >decision maker. How should I address this? This is not a WG. It never was, and never pretended to be. If you are unclear on this concept, please read the Tao of the IETF document. There is an undisclosed number of people and archivers who subscribe to this list. No one should send any information to this list that could be considered confidential. There is no "final decision maker" for this document. The authors will strive to take the reasonable technical concerns of others into account when crafting the protocol. We are quite sure that we cannot make everyone happy in the end. --Paul Hoffman, Director --Internet Mail Consortium From owner-ietf-imaa Thu Feb 13 10:15:47 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DIFlN19434 for ietf-imaa-bks; Thu, 13 Feb 2003 10:15:47 -0800 (PST) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1DIFkd19430 for ; Thu, 13 Feb 2003 10:15:46 -0800 (PST) Received: (qmail 8486 invoked by uid 1016); 13 Feb 2003 18:16:14 -0000 Date: 13 Feb 2003 18:16:14 -0000 Message-ID: <20030213181614.8485.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: ietf-imaa@imc.org Subject: facts about the real world, part 1 References: <4.2.0.58.J.20030209173037.05a45ca0@localhost> <20030210114016.GA9872@nicemice.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Adam M. Costello writes: > I have never encountered a case-sensitive local-part. Has anyone here > ever encountered a case-sensitive local-part? Certainly. Here's the qmail situation, for example: * Case-insensitive: ASCII-lowercased mailbox names are compared to case-sensitive names in /etc/passwd and .qmail-*. * Can go either way: The original mailbox name is passed to mail delivery agents. * Case-sensitive: ezmlm, mailing-list-management software running under qmail, pays attention to case in its mailbox names. All of this is very widely deployed. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-ietf-imaa Thu Feb 13 10:21:46 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DILkd19635 for ietf-imaa-bks; Thu, 13 Feb 2003 10:21:46 -0800 (PST) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1DILjd19631 for ; Thu, 13 Feb 2003 10:21:45 -0800 (PST) Received: (qmail 10863 invoked by uid 1016); 13 Feb 2003 18:22:13 -0000 Date: 13 Feb 2003 18:22:13 -0000 Message-ID: <20030213182213.10862.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: ietf-imaa@imc.org Subject: facts about the real world, part 2 References: <1045135666.14953.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami writes: > Delimiter is to my knowledge typically either '+' or '-', though it's > possible that there are others in use The Andrew mail system---still in use, and historically the first system to provide convenient subaddressing to users---has = as its standard separator; the same separator has some uses in ezmlm. Although the default qmail separator is -, qmail makes it very easy for the system administrator to choose any byte as a separator. (With a bit more work, you can even have different separators for different users.) ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-ietf-imaa Thu Feb 13 10:29:58 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DITwV19790 for ietf-imaa-bks; Thu, 13 Feb 2003 10:29:58 -0800 (PST) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1DITud19780 for ; Thu, 13 Feb 2003 10:29:56 -0800 (PST) Received: (qmail 14214 invoked by uid 1016); 13 Feb 2003 18:30:24 -0000 Date: 13 Feb 2003 18:30:24 -0000 Message-ID: <20030213183024.14213.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: ietf-imaa@imc.org Subject: Re: The typing issue References: <138AA78F80DCE84B8EE424399FFBF9C904FAA1@exchange.ad.skymv.com> <3E4B5CAD.9090304@bic.nus.edu.sg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Tan Tin Wee writes: > he would naturally have both an ASCII email > address and the Japanese email address As I said, that's forcing every international user to set up a second email address in ASCII. With ISO 14755, the second email address disappears. There's still ASCII information on the business card---but it isn't a second address; it's a universal explanation of how to type the first address. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-ietf-imaa Thu Feb 13 11:05:40 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DJ5eu20634 for ietf-imaa-bks; Thu, 13 Feb 2003 11:05:40 -0800 (PST) Received: from mail.uni-bielefeld.de (IDENT:72@mail2.uni-bielefeld.de [129.70.4.90]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DJ5Yd20626 for ; Thu, 13 Feb 2003 11:05:34 -0800 (PST) Received: from 192.168.0.17 (ppp36-99.hrz.uni-bielefeld.de [129.70.36.99]) by mail.uni-bielefeld.de (Sun Internet Mail Server sims.4.0.2000.10.12.16.25.p8) with ESMTP id <0HA9002W5H0ZB4@mail.uni-bielefeld.de> for ietf-imaa@imc.org; Thu, 13 Feb 2003 20:05:32 +0100 (MET) Date: Thu, 13 Feb 2003 19:50:55 +0100 From: Marc Mutz Subject: Open Issue: Stored strings vs. queries. To: ietf-imaa@imc.org Message-id: <200302131950.55716@sendmail.mutz.com> Organization: KDE MIME-version: 1.0 Content-type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_Pk+S+FS5Koy9ryY"; charset="iso-8859-1" Content-transfer-encoding: 7bit User-Agent: KMail/1.5.9 X-PGP-Key: 0xBDBFE838 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --Boundary-02=_Pk+S+FS5Koy9ryY Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Description: signed data Content-Disposition: inline Hi! An interesting issue is what is to be considered stored local-parts and=20 what is a queried local-part. Obvious stored local-parts: =2D MTA config =2D address books (e.g. LDAP) Obvious queries: =2D address lookups (e.g. LDAP queries) =2D SMTP commands Non-obvious: =2D Mail headers One might very well say that a mail header is a request (in that the=20 user enters it and the SMTP server for the given domain needs to look=20 it up to return success or failure), so that it's slots would fall into=20 the "query" category. And naturally, one would like to have the content=20 of the message headers and that of the smtp commands be subject to the=20 same rules, now that 282{1,2} agree on the mailbox definition. However, the POV that the message is stored and it's addresses will be=20 subject to queries (e.g. by user filtering or searching) isn't way off,=20 too. There's also the argument that what the server looks up is the=20 argument given to SMTP's RCPT TO command, not what's in the header=20 fields. So the main question I see is whether header field slots (and,=20 consequently[1], SMTP command slots) are queries or stored. Marc [1] I think that the whatever category those belong to, they should both=20 belong to the same class. =2D-=20 Nie wird so viel gelogen wie vor der Wahl, w=E4hrend des Kriegs und nach der Jagd -- Otto von Bismarck --Boundary-02=_Pk+S+FS5Koy9ryY Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQA+S+kP3oWD+L2/6DgRAggyAJ0UiO7wNe8s1xQIt52SIXYoZQp81gCgv1KR zbOiBJtKL6dooUQmIT+PWYw= =Xdx2 -----END PGP SIGNATURE----- --Boundary-02=_Pk+S+FS5Koy9ryY-- From owner-ietf-imaa Thu Feb 13 11:05:38 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DJ5cC20630 for ietf-imaa-bks; Thu, 13 Feb 2003 11:05:38 -0800 (PST) Received: from mail.uni-bielefeld.de (IDENT:72@mail2.uni-bielefeld.de [129.70.4.90]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DJ5Wd20624 for ; Thu, 13 Feb 2003 11:05:32 -0800 (PST) Received: from 192.168.0.17 (ppp36-99.hrz.uni-bielefeld.de [129.70.36.99]) by mail.uni-bielefeld.de (Sun Internet Mail Server sims.4.0.2000.10.12.16.25.p8) with ESMTP id <0HA9002W5H0ZB4@mail.uni-bielefeld.de> for ietf-imaa@imc.org; Thu, 13 Feb 2003 20:05:30 +0100 (MET) Date: Thu, 13 Feb 2003 19:48:11 +0100 From: Marc Mutz Subject: Open Issue: Splitting of local-part into labels and where? To: ietf-imaa@imc.org Message-id: <200302131948.20807@sendmail.mutz.com> Organization: KDE MIME-version: 1.0 Content-type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_0h+S+kOHuL79s6j"; charset="us-ascii" Content-transfer-encoding: 7bit User-Agent: KMail/1.5.9 X-PGP-Key: 0xBDBFE838 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --Boundary-02=_0h+S+kOHuL79s6j Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Content-Description: signed data Content-Disposition: inline Hi! The question of whether or not to split the dequoted local-part into=20 labels and if so, at which delimiter characters, remains unanswered in=20 the -00 draft. I think that splitting into labels is needed to keep old software (esp.=20 MTAs and filtering software) working. I can't add more to Roy's=20 arguments here, and I feel there will be not much discussion about this=20 particular issue. The more interesting point is _where_ to split. There are some very=20 obvious characters (mainly full stops and hyphens), but apart from=20 that, the rest of the candidate chars is much less clear. I'd include at least the following: @ [1] + (used for subaddresses) The draft also mentions '!', Roy mentioned the underscore. In addition, all such separators should be recognized in all their=20 variants (for full-stops, see IDNA, for @ see draft, for others I admit=20 to not know the Unicode repertoire by heart ;-)) and be replaced with=20 their US-ASCII equivalents. The draft also mentions the option of using all non-alnum US-ASCII=20 characters. If the number of equivalent Unicode code points is small,=20 then this is certainly the best option, although we should then provide=20 a mapping table for the to-usascii mapping. Marc [1] Which reminds my that the draft specifies splitting local-part and=20 domain at _the_ at-sign. It should probably read "at the _last_=20 at-sign", since local-parts may contain at-signs themselves. =2D-=20 It's one thing to accept a risk to your own data, but quite another to standardize on something that imposes that risk on others, no matter how unlikely you think it is that anything "really bad" will happen, and no matter how desirable the outcome. -- Bart Schaefer, on ietf-822 --Boundary-02=_0h+S+kOHuL79s6j Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQA+S+h03oWD+L2/6DgRAnJRAJ4upfq3kM+jKGaGTEiN0WS2q6dY9gCfcTcG 23gpkEsxEw7trxbbnwvxiMI= =uGFj -----END PGP SIGNATURE----- --Boundary-02=_0h+S+kOHuL79s6j-- From owner-ietf-imaa Thu Feb 13 11:38:41 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DJcfQ22706 for ietf-imaa-bks; Thu, 13 Feb 2003 11:38:41 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DJced22701 for ; Thu, 13 Feb 2003 11:38:40 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id OAA09916; Thu, 13 Feb 2003 14:38:41 -0500 Message-Id: <4.2.0.58.J.20030213141928.050c46f0@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Thu, 13 Feb 2003 14:21:26 -0500 To: "D. J. Bernstein" , ietf-imaa@imc.org From: Martin Duerst Subject: Re: The typing issue In-Reply-To: <20030213183024.14213.qmail@cr.yp.to> References: <138AA78F80DCE84B8EE424399FFBF9C904FAA1@exchange.ad.skymv.com> <3E4B5CAD.9090304@bic.nus.edu.sg> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Hello Dan, As I said, nobody is forcing anybody to do anything. Addressees who think they like ISO 14755 better will go for that one. Addressees who think they like an ASCII equivalent email address better will go for that one. Let's just see how things develop, and talk again in a few years. Regards, Martin. At 18:30 03/02/13 +0000, D. J. Bernstein wrote: >Tan Tin Wee writes: > > he would naturally have both an ASCII email > > address and the Japanese email address > >As I said, that's forcing every international user to set up a second >email address in ASCII. > >With ISO 14755, the second email address disappears. There's still ASCII >information on the business card---but it isn't a second address; it's a >universal explanation of how to type the first address. > >---D. J. Bernstein, Associate Professor, Department of Mathematics, >Statistics, and Computer Science, University of Illinois at Chicago From owner-ietf-imaa Thu Feb 13 11:38:44 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DJciV22718 for ietf-imaa-bks; Thu, 13 Feb 2003 11:38:44 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DJchd22711 for ; Thu, 13 Feb 2003 11:38:43 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id OAA09919; Thu, 13 Feb 2003 14:38:41 -0500 Message-Id: <4.2.0.58.J.20030213142550.03349d90@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Thu, 13 Feb 2003 14:34:58 -0500 To: Marc Mutz , ietf-imaa@imc.org From: Martin Duerst Subject: Re: Open Issue: Splitting of local-part into labels and where? In-Reply-To: <200302131948.20807@sendmail.mutz.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: At 19:48 03/02/13 +0100, Marc Mutz wrote: >Hi! > >The question of whether or not to split the dequoted local-part into >labels and if so, at which delimiter characters, remains unanswered in >the -00 draft. >The more interesting point is _where_ to split. There are some very >obvious characters (mainly full stops and hyphens), but apart from >that, the rest of the candidate chars is much less clear. My understanding is that this is very much a slippery slope. So we better stop as soon as possible. I see some argument for making '.' a delimiter, because that would make it easier to apply the same function to a whole email address as to a domain name only. But that on the other hand prohibits us to make '-' a delimiter. >In addition, all such separators should be recognized in all their >variants (for full-stops, see IDNA, for @ see draft, for others I admit >to not know the Unicode repertoire by heart ;-)) and be replaced with >their US-ASCII equivalents. There are a lot of variants, and they very much depend on circumstances. IDNA recognizes the full-width and ideographic variants of the full stop, but as far as I remember, it does not recognize any other variants or equivalents. Making sure that a user enters a '@' in the right form can and should be a quality of implementation issue under the responsibility of the application. Such issues can easily be integrated into the input architecture e.g. for East Asian languages and scripts. If we treat all non-alphanum ASCII characters special, then what about non-alphabetic/syllabic/ideographic symbols,... in other scripts? Shouldn't they also be treated as special separators? Regards, Martin. From owner-ietf-imaa Thu Feb 13 11:49:14 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DJnE023016 for ietf-imaa-bks; Thu, 13 Feb 2003 11:49:14 -0800 (PST) Received: from pie1.i-dns.net ([203.81.44.31]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DJnDd23012 for ; Thu, 13 Feb 2003 11:49:13 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by pie1.i-dns.net (Postfix) with ESMTP id 6E60978A18 for ; Thu, 13 Feb 2003 19:49:14 +0000 (GMT) Received: from pie1.i-dns.net ([127.0.0.1]) by localhost (pie1.i-dns.net [127.0.0.1:10024]) (amavisd-new) with SMTP id 98275-07 for ; Thu, 13 Feb 2003 19:49:12 +0000 (GMT) Received: from jeffreyibm (unknown [211.219.53.139]) by pie1.i-dns.net (Postfix) with SMTP id 83C7F78A2A for ; Thu, 13 Feb 2003 19:49:08 +0000 (GMT) Message-ID: <008701c2d399$40ca9330$fc00a8c0@jeffreyibm> From: "Jeffrey J Zahari" To: References: <138AA78F80DCE84B8EE424399FFBF9C904FAA1@exchange.ad.skymv.com> <3E4B5CAD.9090304@bic.nus.edu.sg> <20030213183024.14213.qmail@cr.yp.to> Subject: Re: The typing issue Date: Fri, 14 Feb 2003 04:50:56 +0900 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 X-Virus-Scanned: by amavisd-new Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Just out of curiosity, are there any complete systems/applications that support ISO14755? Because implementing idnc3 phase 1 seems like it could be a long time a coming. jeffrey j zahari ----- Original Message ----- From: "D. J. Bernstein" To: Sent: Friday, February 14, 2003 3:30 AM Subject: Re: The typing issue > > Tan Tin Wee writes: > > he would naturally have both an ASCII email > > address and the Japanese email address > > As I said, that's forcing every international user to set up a second > email address in ASCII. > > With ISO 14755, the second email address disappears. There's still ASCII > information on the business card---but it isn't a second address; it's a > universal explanation of how to type the first address. > > ---D. J. Bernstein, Associate Professor, Department of Mathematics, > Statistics, and Computer Science, University of Illinois at Chicago > From owner-ietf-imaa Thu Feb 13 12:57:08 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DKv8K26394 for ietf-imaa-bks; Thu, 13 Feb 2003 12:57:08 -0800 (PST) Received: from mail.uni-bielefeld.de (IDENT:72@mail2.uni-bielefeld.de [129.70.4.90]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DKv2d26385 for ; Thu, 13 Feb 2003 12:57:02 -0800 (PST) Received: from 192.168.0.17 (ppp36-220.hrz.uni-bielefeld.de [129.70.36.220]) by mail.uni-bielefeld.de (Sun Internet Mail Server sims.4.0.2000.10.12.16.25.p8) with ESMTP id <0HA9008APM70QG@mail.uni-bielefeld.de> for ietf-imaa@imc.org; Thu, 13 Feb 2003 21:57:01 +0100 (MET) Date: Thu, 13 Feb 2003 21:43:28 +0100 From: Marc Mutz Subject: Re: Open Issue: Splitting of local-part into labels and where? In-reply-to: <4.2.0.58.J.20030213142550.03349d90@localhost> To: ietf-imaa@imc.org Message-id: <200302132143.49081@sendmail.mutz.com> Organization: KDE MIME-version: 1.0 Content-type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_EOAT+UDYhC4polm"; charset="us-ascii" Content-transfer-encoding: 7bit User-Agent: KMail/1.5.9 X-PGP-Key: 0xBDBFE838 References: <4.2.0.58.J.20030213142550.03349d90@localhost> Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --Boundary-02=_EOAT+UDYhC4polm Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Content-Description: signed data Content-Disposition: inline On Thursday 13 February 2003 20:34, Martin Duerst wrote: > I see some argument > for making '.' a delimiter, because that would make it easier > to apply the same function to a whole email address as to > a domain name only. We will not be able to use the same function for the LHS as for the RHS=20 anyway. That's simply b/c there are characters that are allowed in=20 local-parts, but not in domains (the definition of quoted-string with=20 that of dot-atom in rfc 2822). > If we treat all non-alphanum ASCII characters special, then what > about non-alphabetic/syllabic/ideographic symbols,... in other > scripts? Shouldn't they also be treated as special separators? No, b/c they were never used as separators (since they can't appear in=20 local-parts, obviously). OTOH, dot, plus, etc _are_ used as separators currently. And since=20 no-one here can tell what separators are used in the wild (e.g. b/c the=20 subaddress separator is config'able), it's only logical to split at=20 non-alphanum ASCII characters. It also makes the ACE-ILP more readable=20 for those that have to deal with it (users of non-imaa compliant MUAs,=20 MTA admins). We should, of course, exclude pathological cases, such as control=20 characters, so a quick lookup in a charset table reveals the following=20 candidates: HT CRLF SP (whitespace) ! " # $ % & ' ( ) * +, - . / : ; < =3D > ? @ [ \ ] ^ _ ` { | } ~ all of which are currently allowed in quoted-string and thus in=20 local-part. (Note: Only splitting at dots would e.g. mean that we mangle whitespace!=20 I don't think that that's what we want, esp. b/c CRLF is a "multi-byte=20 character", so to speak) Marc =2D-=20 It has become fashionable in the post Cold War world to label opponents as terrorists [...]. By doing so, the authorities instill within society a culture of fear, leading people to accept that their rights (and the rights of others) be trampled on for the sake of the common good. In other words, it justifies the loss of privacy and a state of surveillance they would otherwise not accept. Both communism and fascism were examples of this technique used to perfection. -- John Horvath: The Internet: A Terrorist Network? Telepolis 2001/08/22 (#9350) --Boundary-02=_EOAT+UDYhC4polm Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQA+TAOE3oWD+L2/6DgRAqDGAKDe3PTP8PdgdVJtu88Q2zuKh4qtuwCbBMSV EJQ0g7Rk/4vI3u2yGVtRdUA= =5F5m -----END PGP SIGNATURE----- --Boundary-02=_EOAT+UDYhC4polm-- From owner-ietf-imaa Thu Feb 13 13:04:42 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DL4gh26691 for ietf-imaa-bks; Thu, 13 Feb 2003 13:04:42 -0800 (PST) Received: from [63.202.92.157] (adsl-63-202-92-157.dsl.snfc21.pacbell.net [63.202.92.157]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DL4ad26687; Thu, 13 Feb 2003 13:04:36 -0800 (PST) Mime-Version: 1.0 X-Sender: phoffman@mail.imc.org Message-Id: In-Reply-To: <200302131950.55716@sendmail.mutz.com> References: <200302131950.55716@sendmail.mutz.com> X-Habeas-SWE-1: winter into spring X-Habeas-SWE-2: brightly anticipated X-Habeas-SWE-3: like Habeas SWE (tm) X-Habeas-SWE-4: Copyright 2002 Habeas (tm) X-Habeas-SWE-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-SWE-6: email in exchange for a license for this Habeas X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this X-Habeas-SWE-9: mark in spam to . Date: Thu, 13 Feb 2003 13:04:33 -0800 To: Marc Mutz , ietf-imaa@imc.org From: Paul Hoffman / IMC Subject: Re: Open Issue: Stored strings vs. queries. Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: At 7:50 PM +0100 2/13/03, Marc Mutz wrote: >One might very well say that a mail header is a request (in that the >user enters it and the SMTP server for the given domain needs to look >it up to return success or failure), so that it's slots would fall into >the "query" category. And naturally, one would like to have the content >of the message headers and that of the smtp commands be subject to the >same rules, now that 282{1,2} agree on the mailbox definition. > >However, the POV that the message is stored and it's addresses will be >subject to queries (e.g. by user filtering or searching) isn't way off, >too. There's also the argument that what the server looks up is the >argument given to SMTP's RCPT TO command, not what's in the header >fields. This doesn't match the general definition of queries and stored in Stringprep. I cannot see how a header would be considered a query. It isn't asking for anything, it is a part of the message. --Paul Hoffman, Director --Internet Mail Consortium From owner-ietf-imaa Thu Feb 13 14:12:09 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DMC9X28252 for ietf-imaa-bks; Thu, 13 Feb 2003 14:12:09 -0800 (PST) Received: from patan.sun.com (patan.Sun.COM [192.18.98.43]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DMC7d28244 for ; Thu, 13 Feb 2003 14:12:07 -0800 (PST) Received: from esunmail ([129.147.58.120]) by patan.sun.com (8.9.3+Sun/8.9.3) with ESMTP id PAA10894 for ; Thu, 13 Feb 2003 15:12:09 -0700 (MST) Received: from xpa-fe2 (esunmail [129.147.58.120]) by edgemail1.Central.Sun.COM (iPlanet Messaging Server 5.2 HotFix 1.08 (built Dec 6 2002)) with ESMTP id <0HA9000SWPO9VA@edgemail1.Central.Sun.COM> for ietf-imaa@imc.org; Thu, 13 Feb 2003 15:12:09 -0700 (MST) Received: from nifty-jr.west.sun.com ([129.153.12.95]) by mail.sun.net (iPlanet Messaging Server 5.2 HotFix 1.08 (built Dec 6 2002)) with ESMTPSA id <0HA900JDTPO63G@mail.sun.net> for ietf-imaa@imc.org; Thu, 13 Feb 2003 15:12:08 -0700 (MST) Date: Thu, 13 Feb 2003 14:11:50 -0800 From: Chris Newman Subject: Re: facts about the real world, part 2 In-reply-to: <20030213182213.10862.qmail@cr.yp.to> To: "D. J. Bernstein" , ietf-imaa@imc.org Message-id: <2147483647.1045145510@nifty-jr.west.sun.com> MIME-version: 1.0 X-Mailer: Mulberry/3.0.0 (Mac OS X) Content-type: text/plain; charset=us-ascii; format=flowed Content-transfer-encoding: 7BIT Content-disposition: inline X-message-flag: Outlook: the best virus distribution system around References: <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <20030213182213.10862.qmail@cr.yp.to> Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: begin quotation by D. J. Bernstein on 2003/2/13 18:22 +0000: > Roy Badami writes: >> Delimiter is to my knowledge typically either '+' or '-', though it's >> possible that there are others in use > > The Andrew mail system---still in use, and historically the first system > to provide convenient subaddressing to users---has = as its standard > separator; the same separator has some uses in ezmlm. AMS at Carnegie Mellon University (where it was created) used '+'. Perhaps the AMS deployment you saw changed it from the default '+' to '='. But there are some other systems using "=" as a subaddress delimiter. - Chris From owner-ietf-imaa Thu Feb 13 14:32:53 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DMWrN28676 for ietf-imaa-bks; Thu, 13 Feb 2003 14:32:53 -0800 (PST) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1DMWqd28672 for ; Thu, 13 Feb 2003 14:32:52 -0800 (PST) Received: (qmail 53066 invoked by uid 1016); 13 Feb 2003 22:33:20 -0000 Date: 13 Feb 2003 22:33:20 -0000 Message-ID: <20030213223320.53054.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: ietf-imaa@imc.org Subject: Re: facts about the real world, part 2 References: <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <20030213182213.10862.qmail@cr.yp.to> <2147483647.1045145510@nifty-jr.west.sun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Chris Newman writes: > AMS at Carnegie Mellon University (where it was created) used '+'. Perhaps > the AMS deployment you saw changed it from the default '+' to '='. My comment was, in fact, based on current = addresses at CMU. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-ietf-imaa Thu Feb 13 14:31:39 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DMVdA28656 for ietf-imaa-bks; Thu, 13 Feb 2003 14:31:39 -0800 (PST) Received: from mail.uni-bielefeld.de (IDENT:72@mail2.uni-bielefeld.de [129.70.4.90]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DMVWd28650 for ; Thu, 13 Feb 2003 14:31:33 -0800 (PST) Received: from 192.168.0.17 (ppp36-244.hrz.uni-bielefeld.de [129.70.36.244]) by mail.uni-bielefeld.de (Sun Internet Mail Server sims.4.0.2000.10.12.16.25.p8) with ESMTP id <0HA900BS3QKEXX@mail.uni-bielefeld.de> for ietf-imaa@imc.org; Thu, 13 Feb 2003 23:31:32 +0100 (MET) Date: Thu, 13 Feb 2003 23:20:01 +0100 From: Marc Mutz Subject: Re: Open Issue: Stored strings vs. queries. In-reply-to: To: ietf-imaa@imc.org Message-id: <200302132320.20874@sendmail.mutz.com> Organization: KDE MIME-version: 1.0 Content-type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_koBT+av/jSvKTA3"; charset="us-ascii" Content-transfer-encoding: 7bit User-Agent: KMail/1.5.9 X-PGP-Key: 0xBDBFE838 References: <200302131950.55716@sendmail.mutz.com> Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --Boundary-02=_koBT+av/jSvKTA3 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Content-Description: signed data Content-Disposition: inline On Thursday 13 February 2003 22:04, Paul Hoffman / IMC wrote: > I cannot see how a header would be considered a query. OK, let's do a Gedankenexperiment: I think we agree that the smtp slots have to be queries. (Else, a single=20 intermediate MTA running oldVersion will break the transport of a=20 newVersion-MUA-composed message to a newVersion-mailbox. Consider the case of stored mail header field slots. I want to send a message with an oldVersion MUA to a newVersion mailbox=20 whose local-part contains a code point from the set U(oldVersion) \cap AO(newVersion) My MUA will rfc2822-serialize the message, thereby checking that any=20 to/cc/bcc address meets the requirements of oldVersion-stored=20 addresses. The address for the newVersion mailbox will fail that step,=20 since it contains a code point from oldVersion of the U set. This way, only addresses meeting the requirements of stored addresses=20 are permitted. This doesn't even change if the MUA keeps track of the=20 user-entered address separately of the rfc2822 serialization (for later=20 use in the smtp rcpt to command), since it is simply not permitted to=20 put the address into the rfc2822 serialization. The only option for the MUA would be to leave out any address that fails=20 the stored requirements test, but not the query requirements test[1]=20 from the rfc2822 serialization, but use the address in the RCPT TO=20 command. Well, that's what spammers do... Consider OTOH the case of query mail header field slots. My MUA will detect that the mailbox meets the requirements of a query=20 mailbox, and happily put the mailbox together with the others (if any)=20 into both the rfc2822 serialization slots, as well as the SMTP slots.=20 Sending succeeds. Conclusion: If we want oldVersion MUAs to be able to send mails to=20 newVersion mailboxes (ie. a mailbox whose address contains newly=20 assigned code points), then rfc2822 slots need to be queries or - if we=20 insist they are stored - MUAs be required to omit the newVersion=20 mailbox from the message, but give it in smtp rcpt to. Since the latter option isn't really an option (all mails from=20 oldVersion MUA users would appear to be bcc'ed to me, with the BCC=20 header stripped), the question boils down to: Do we want oldVersion MUAs to be able to send messages to newVersion=20 mailboxes? My answer would be "Yes", since that's what I understand IDNA works hard=20 to achieve for DNS (an oldVersion browser can still do a dns lookup of=20 a newVersion IDN). Marc [1] If both fail, the address should be rejected. =2D-=20 Ich gegen meinen Bruder. Ich und mein Bruder gegen unseren Cousin. Ich, mein Bruder und unser Cousin gegen unsere Nachbarn. Wir alle gegen den Fremden. -- Beduinen-Sprichwort --Boundary-02=_koBT+av/jSvKTA3 Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQA+TBok3oWD+L2/6DgRAmEAAJ9yEoVVfilLvFTbB5d7HsezDI0v5QCg4jSO aqozkASVdbPFFRP6zXmHD4k= =g5Dt -----END PGP SIGNATURE----- --Boundary-02=_koBT+av/jSvKTA3-- From owner-ietf-imaa Thu Feb 13 15:09:40 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DN9eG29479 for ietf-imaa-bks; Thu, 13 Feb 2003 15:09:40 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DN9dd29475 for ; Thu, 13 Feb 2003 15:09:39 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jSUA-0006bq-00 for ; Thu, 13 Feb 2003 15:09:42 -0800 Date: Thu, 13 Feb 2003 23:09:42 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: how should I address this? Message-ID: <20030213230942.GA25048@nicemice.net> Reply-To: IETF IMAA list References: <5.2.0.9.0.20030213182505.040e5570@pop.online.fr> <5.2.0.9.0.20030213182505.040e5570@pop.online.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5.2.0.9.0.20030213182505.040e5570@pop.online.fr> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: "J-F C. (Jefsey) Morfin" wrote: > I have been explained why case-sensitivity is necessary to a worldwide > business project... > > They are ready to disclose their application under NDA to who ever is > the final decision maker. Paul Hoffman / IMC replied: > There is no "final decision maker" for this document. The authors > will strive to take the reasonable technical concerns of others into > account when crafting the protocol. Furthermore, IMAA aims to be a public standard, and therefore it is being developed by a public forum (this mailing list). It would be improper for the design decisions to be based on secret discussions. The business project in question can choose to participate in this public forum, or they can choose not to participate. AMC From owner-ietf-imaa Thu Feb 13 15:22:48 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DNMmr29693 for ietf-imaa-bks; Thu, 13 Feb 2003 15:22:48 -0800 (PST) Received: from smtp5.andrew.cmu.edu (SMTP5.andrew.cmu.edu [128.2.10.85]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DNMld29689 for ; Thu, 13 Feb 2003 15:22:47 -0800 (PST) Received: from penguin.andrew.cmu.edu (PENGUIN.andrew.cmu.edu [128.2.121.100]) by smtp5.andrew.cmu.edu (8.12.3.Beta2/8.12.3.Beta2) with ESMTP id h1DNMmDX006235; Thu, 13 Feb 2003 18:22:48 -0500 Date: Thu, 13 Feb 2003 18:22:48 -0500 Message-Id: <200302132322.h1DNMmDX006235@smtp5.andrew.cmu.edu> From: Lawrence Greenfield X-Mailer: BatIMail version 3.3 To: ietf-imaa@imc.org In-reply-to: <20030213223320.53054.qmail@cr.yp.to> Subject: Re: facts about the real world, part 2 References: <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <20030213182213.10862.qmail@cr.yp.to> <2147483647.1045145510@nifty-jr.west.sun.com> <20030213223320.53054.qmail@cr.yp.to> User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (=?ISO-8859-4?Q?Unebigory?= =?ISO-8859-4?Q?=F2mae?=) Emacs/21.2 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Date: 13 Feb 2003 22:33:20 -0000 From: "D. J. Bernstein" Chris Newman writes: > AMS at Carnegie Mellon University (where it was created) used > '+'. Perhaps the AMS deployment you saw changed it from the > default '+' to '='. My comment was, in fact, based on current = addresses at CMU. That's a different system from AMS, actually. That's run by the CS department. That's running MMDF. The large AMS installation at CMU has been replaced by Cyrus IMAP and Sendmail. MMDF has some interesting other quirks besides supporting both + and = seperator characters. It doesn't support ESMTP, and it actually chokes on 8-bit characters _anywhere_ in the message. Larry From owner-ietf-imaa Thu Feb 13 15:53:20 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1DNrKM00733 for ietf-imaa-bks; Thu, 13 Feb 2003 15:53:20 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1DNrJd00729 for ; Thu, 13 Feb 2003 15:53:19 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jTAN-0006iG-00; Thu, 13 Feb 2003 15:53:19 -0800 Date: Thu, 13 Feb 2003 23:53:19 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Cc: Roy Badami Subject: Re: A couple of comments on the open issues... Message-ID: <20030213235319.GB25048@nicemice.net> Reply-To: IETF IMAA list , Roy Badami References: <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami wrote: > there may be sites that wish to allow the creation of > internationalized mailboxes without being in a position to upgrade to > IMAA-compliant software. > > In some cases, it may be adequate simply to create mailboxes within > the legacy system corresponding to the ACE-encoded local part. Indeed, one of the primary design goals of IMAA, like IDNA, is that it not be necessary to upgrade any infrastructure, but only end-user applications. > I'm particular keen that the use of a tag or suffix with a local > username in not broken by IMAA. Many MTAs provide the functionality > that all mail addressed to will be delivered > to , either by default, or as a configuration option. > > This effectively allows a user of such an MTA (that has been suitably > configureed) to have multiple e-mail addresses without requiring any > action on the part of the mail administrator, and allows the user to > run scripts that process their mail according to the suffix. So if IMAA operates on the entire local part, this multiple-address feature will be unavailable to users with internationalized local parts. This is a good argument in favor of having IMAA operate independently on subparts of the local part. (Of course, there are arguments on the other side too, and I'm sure we'll hear them. I haven't yet chosen a side in this debate.) > I would urge the list to ensure that as far as possible local parts > generated by any IMAA specification adhere to a far more conservative > character set than that mandated by RFC822/2822, namely the set of > characters that has historically been commonplace in local parts. I > would personally regard this as being alphanumerics, period, hyphen > and underscore. > > It is unfortunately still the case that there are systems in > widespread use which have difficulty in accomodating RFC822 addresses > that contain valid but unusual characters. > > there may be MTAs in use which ascribe special meaning to unusual > characters in a way which is not easily configurable (if it is > configurable at all). I think that's also a good argument. I have a strong gut feeling that we wouldn't want dots in the prefix, for any number of reasons. AMC From owner-ietf-imaa Thu Feb 13 16:54:07 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E0s7e02122 for ietf-imaa-bks; Thu, 13 Feb 2003 16:54:07 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E0s5d02118 for ; Thu, 13 Feb 2003 16:54:05 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1E0sOKA018315 for ; Fri, 14 Feb 2003 00:54:24 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Re: Open Issue: Splitting of local-part into labels and where? From: Roy Badami Date: Fri, 14 Feb 2003 00:54:24 +0000 Message-ID: <1045184064.18314.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I'd include at least the following: @ [1] + (used for subaddresses) The draft also mentions '!', Roy mentioned the underscore. Actually, I mention hyphen as a separator. Underscore I simply mention as a common character in e-mail addresses of the form r_badami, and hence part of my 'safe set' of characters that it is reasonable to assume all common software handles correctly. -roy From owner-ietf-imaa Thu Feb 13 16:49:17 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E0nH302040 for ietf-imaa-bks; Thu, 13 Feb 2003 16:49:17 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E0nFd02035 for ; Thu, 13 Feb 2003 16:49:15 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1E0nXKA018277 for ; Fri, 14 Feb 2003 00:49:34 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Another issue: quoting From: Roy Badami Date: Fri, 14 Feb 2003 00:49:33 +0000 Message-ID: <1045183773.18276.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Apologies if I've missed this (I've only had time to skim the document and discussion so far), but I'd like to raise the issue of quoting as a possible open issue (that isn't listed as such in the base document). Quoting in local parts is a messy construct. As I understand it, the revised grammar in RFC2822 doesn't allow quoting (or linear whitespace) in a localpart, though of course they are still allowed for backward compatibility as part of the obsolete syntax. Given that these constructs are deprecated by RFC2822, is there any reason to permit their use in internationalized local parts at all? I can think of one very strong reason for forbidding them outright in internationalized localparts: complexity. The current specification is rarely implemented correctly on the current Internet. Fully supporting quoting in IMAA could significantly complicate the complexity of IMAA, depending on what other design choices are made, and there's no reason to believe it would be supported any better than it is with ASCII. One of the problems with the 822 specification is that localparts aren't just a sequence of characters, with quoting being just a transport encoding. Localparts are a sequence of tokens, namely atoms and dots. In fact, the definition of "dequoted local part" in the base document is problematic as it stands, because in 822 there isn't any such thing. By way of example, it seems to me that the RFC822 parse of the following two localparts is distinct, and as a result they could validly refer to distinct mailboxes: roy.badami "roy.badami" The first is a sequence of three tokens: the atom "roy", the token "." and the atom "badami". The second is a single atom. The following three localparts are all equivalent, however: roy.badami roy . badami "roy"."badami" Most software wants to treat the localpart as simply a sequence of characters (at least after dequoting) but it isn't defined that way. Continuing to allow quoting in localparts may not be without its benefits (eg for X.400 gatewaying), but since the authors of RFC2822 appear to have already made a decision on the future of quoting, I suggest that we consider the following option: I would therefore propose that we consider defining a grammar for internationalized local parts based on the 2822 grammar. As a result, any localpart that contains a non-ASCII character MUST NOT contain any of the following: quoting (ie backlash or double-quote) whitespace any ascii character that is not permited unquoted. Opinions? -roy From owner-ietf-imaa Thu Feb 13 17:12:04 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E1C4b02551 for ietf-imaa-bks; Thu, 13 Feb 2003 17:12:04 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E1C2d02547 for ; Thu, 13 Feb 2003 17:12:02 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1E1CKKA018392 for ; Fri, 14 Feb 2003 01:12:21 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Re: Open Issue: Splitting of local-part into labels and where? From: Roy Badami Date: Fri, 14 Feb 2003 01:12:20 +0000 Message-ID: <1045185140.18391.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: My understanding is that this is very much a slippery slope. So we better stop as soon as possible. I see some argument for making '.' a delimiter, because that would make it easier to apply the same function to a whole email address as to a domain name only. If we assume we are going to use punycode, then you prety much *have* to split at dots, otherwise you'd have cases where punycode generates strings containing multiple consecutive dots. Yes, you could make this valid by generating suitably quoting, but you'd be creating local parts that are (a) deprecated by 2822 and (b) stand a significant chance of not working on the current Internet. (See my separate post on quoting.) But that on the other hand prohibits us to make '-' a delimiter. But I don't think it's clear at this stage that the implentational convenience of using the same function justifies the cost to the utility of the specification. My gut feeling at this point is that it doesn't. Actually, I'm not even convinced at this point that it's clear that we should use punycode for the ACE, eg if we choose to ACE-encode strings containing dots, and require the result not to contain consecutive dots. Adam, what comes after AMC-ACE-Z ? :) -roy From owner-ietf-imaa Thu Feb 13 17:48:38 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E1mcl03507 for ietf-imaa-bks; Thu, 13 Feb 2003 17:48:38 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E1mWd03503 for ; Thu, 13 Feb 2003 17:48:34 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1E1mkKA018527 for ; Fri, 14 Feb 2003 01:48:46 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Splitting and encoding of local-parts: some thoughts From: Roy Badami Date: Fri, 14 Feb 2003 01:48:46 +0000 Message-ID: <1045187326.18524.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Some thoughts on where to split, and how to encode. (1) We could encode the whole dequoted localpart in one go. But then we can't use punycode as the ACE, because it could produce multiple consecutive dots, and I really don't think we want to go there. (That's not to say we shouldn't consider this option with a different ACE, which could still be a bootstring profile.) (2) We could simply split at dots, and encode using punycode as the ACE. Each token (between dots) would of course have to be marked in some way (eg with an ACE prefix) if it's ACE-encodeed. This means that if dots are commonplace in internationalized localparts, the extra ACE prefixes would eat into our length limit. Splitting just at dots however doesn't allow much of the maniplulation that is commonly performed on localparts, so the base document proposes an alternative for consideration, namely (3) split at all non-alphanumeric ASCII characters. So we need to pose the question, are non-alphanumeric ASCII characters (or characters that normalize to them) likely to be commonplace in internationalized localparts? If so, this will eat into our lenght limit further. If we want to go down the road of accepting some added complexity in the algorithms in order to maximimize the length of string we can encode, then perhaps neither (2) nor (3) are ideal. Then again, maybe even (3) is good enough if such characters will in practice be uncommon in internationalized local parts. (I note that the IDN WG essentially chose to trade simplicity for coding efficiency in their selection of punycode against the previous WG favourites of RACE and DUDE.) However, if we opt for (1) then we pretty much rule out punycode (though we could use a bootstring profile in which dot is not a basic code point). If we have to go for something different from the IDN encoding, there's not real extra cost in going for something more radically different. While I'm there, though, I'd suggest that if we end up defining a new bootstring profile here, we should consider going for underscore rather than hyphen as the delimited, and use underscores rather than hyphens in the ACE prefix. The reason is that hyphen is often used as a delimited in structured local parts, whereas underscore is a safe character to use in a localpart that is almost never used in this way. I'd also like to throw a slightly wacky idea out there: if we accept that any prefixes and suffixes applied to (or stripped off from) e-mail addresses are restricted to ASCII, then it is adequate to identify the substring of the localpart from the first non-ASCII character to the last non-ASCII character, and encode and mark that in some way. It then becomes safe to append and strip arbirary ASCII suffixes from IMAs. If we avoid common separators such as plus and minus in the output from the ACE, and in whatever markup we use to tag the string as ACE-encoded, it also becomes safe to split on common separators. My concern here is that maybe even this doesn't go far enough. Is it reasonable to restrict users of localpart suffixes (delimited by plus or minus) to restrict their suffixes to ASCII characters? Just some more thoughts, and apologies for the fact that this has turned in to a bit of a stream of consciousness... :) -roy From owner-ietf-imaa Thu Feb 13 17:59:29 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E1xTR03771 for ietf-imaa-bks; Thu, 13 Feb 2003 17:59:29 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E1xRd03767 for ; Thu, 13 Feb 2003 17:59:27 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1E1xgKA018585 for ; Fri, 14 Feb 2003 01:59:42 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Re: facts about the real world, part 2 From: Roy Badami Date: Fri, 14 Feb 2003 01:59:42 +0000 Message-ID: <1045187982.18584.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: AMS at Carnegie Mellon University (where it was created) used '+'. That certainly tallies with my recollections. I remember encountering addresses of the form foo+bar@cmu.edu around 1989/1990, I think. In particular, there was some Andrew-based mailing list management software that used listname+request@cmu.edu, and there were a couple of high-profile lists hosted on this platform around that time. -roy From owner-ietf-imaa Thu Feb 13 18:08:36 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E28aI04001 for ietf-imaa-bks; Thu, 13 Feb 2003 18:08:36 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E28Yd03996 for ; Thu, 13 Feb 2003 18:08:35 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1E28qKA018632 for ; Fri, 14 Feb 2003 02:08:52 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Re: Open Issue: Splitting of local-part into labels and where? From: Roy Badami Date: Fri, 14 Feb 2003 02:08:52 +0000 Message-ID: <1045188532.18629.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: [1] Which reminds my that the draft specifies splitting local-part and domain at _the_ at-sign. It should probably read "at the _last_ at-sign", since local-parts may contain at-signs themselves. Strinctly speaking, if we're going to follow RFC822, then we split at _the_ unquoted at-sign. There can only be one, according to the syntax. There may of course be quoted at-signs in atoms on either side of the unquoted at-sign (though having them on the right would result in a domain name that doesn't follow hostname rules). See me separate thread on quoting on why I think we probably shouldn't go down this road. -roy From owner-ietf-imaa Thu Feb 13 18:19:33 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E2JXR04508 for ietf-imaa-bks; Thu, 13 Feb 2003 18:19:33 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1E2JRd04487 for ; Thu, 13 Feb 2003 18:19:28 -0800 (PST) Received: (qmail 19180 invoked by uid 66); 14 Feb 2003 02:19:20 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 14 Feb 2003 02:19:20 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-1613d); 14 Feb 2003 03:19:14 +0100 Date: 14 Feb 2003 02:44:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8fpGadzJcDD@3247.org> In-Reply-To: <20030213183024.14213.qmail@cr.yp.to> Subject: Re: The typing issue User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-1613d MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: D. J. Bernstein schrieb/wrote: > With ISO 14755, the second email address disappears. There's still > ASCII information on the business card---but it isn't a second > address; it's a universal explanation of how to type the first > address. If I wanted to type an email address from a business card I'd prefer an alternative address I can easily type over some ``strange numbers and letters'' I have to enter while holding down two other keys. In other words, my order of preference is (for characters I can't type): . ASCII transliteration/ASCII-only alias address . Punycode . ISO 14775 Claus -- http://www.faerber.muc.de/ From owner-ietf-imaa Thu Feb 13 18:19:33 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E2JXq04509 for ietf-imaa-bks; Thu, 13 Feb 2003 18:19:33 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1E2JRd04488 for ; Thu, 13 Feb 2003 18:19:28 -0800 (PST) Received: (qmail 19179 invoked by uid 66); 14 Feb 2003 02:19:20 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 14 Feb 2003 02:19:20 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-1613d); 14 Feb 2003 03:19:14 +0100 Date: 14 Feb 2003 02:41:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8fpG$x4JcDD@3247.org> In-Reply-To: <200302131948.20807@sendmail.mutz.com> Subject: Re: Open Issue: Splitting of local-part into labels and where? User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-1613d MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Organization: KDE Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Marc Mutz schrieb/wrote: > The question of whether or not to split the dequoted local-part into > labels and if so, at which delimiter characters, remains unanswered in > the -00 draft. Well, in draft-faerber-i18n-email-netnews-names-00.txt I suggested this list: SP / %x00-1F / "." / "@" / "+" / "%" / "=" / "/" / "," / ";" / ":" / "!" / "(" / ")" / "[" / "]" / "<" / ">" [[RATIONALE: As much delimiters as possible are used to increase the chance that the encoding of individual parts of the identifier are encoded the same way when included in other identifiers: "@" - used to separate local-part and domain name. "+" - used by some mailers for subaddressing "%" - used by some MTAs to embed domains within the local-part of email addresses ("percent-hack") "=" - used within MIXER (RFC 2156) "/" - used wihtin MIXER (RFC 2156), used as a newsgroup component separator in some leagacy non-RFC BBS networks. ",", ";" - used to separate identifiers in many positions ":" - used to seperate (obsolete) source routes from the destination address " " - used to separate source routes from each other. "!" - used as a separator within the Path header in RFC 1036, used as a address separator within (obsolete) UUCP bang addresses "(", ")" - used for comments, used within the replacement for some seperators according to MIXER (e.g. "(a)" instead of "@") "[", "]", "<", ">" - as precaution ]] And all Unicode characters that have a compatibility mapping to one of those, of course. We might want to add control characters - just in case they appear. I'm not quite sure about the quoting characters ``"'' and ``\''. If we apply toASCII/toUnicode after dequoting the local part, there's no reason to include them. If we keep some quotes (canonicalised to the minimal quoted version), we should probably include them. Claus -- http://www.faerber.muc.de/ From owner-ietf-imaa Thu Feb 13 18:19:33 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E2JXJ04507 for ietf-imaa-bks; Thu, 13 Feb 2003 18:19:33 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1E2JRd04489 for ; Thu, 13 Feb 2003 18:19:28 -0800 (PST) Received: (qmail 19187 invoked by uid 66); 14 Feb 2003 02:19:21 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 14 Feb 2003 02:19:21 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-1613d); 14 Feb 2003 03:19:14 +0100 Date: 14 Feb 2003 03:19:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8fpGcCiZcDD@3247.org> In-Reply-To: <200302132143.49081@sendmail.mutz.com> Subject: Re: Open Issue: Splitting of local-part into labels and where? User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-1613d MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Organization: KDE Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Marc Mutz schrieb/wrote: > We will not be able to use the same function for the LHS as for the > RHS anyway. That's simply b/c there are characters that are allowed in > local-parts, but not in domains. So the function will produce invalid output for invalid input on the RHS. That's not a problem. We _can_ invent a function that will yield the same result as the IDNA function for all valid domain names. It can then be used for the RHS, too. Our function does not have to produce exactly identical results for something that is not a domain name. Claus -- http://www.faerber.muc.de/ From owner-ietf-imaa Thu Feb 13 18:19:33 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E2JXg04506 for ietf-imaa-bks; Thu, 13 Feb 2003 18:19:33 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1E2JRd04490 for ; Thu, 13 Feb 2003 18:19:28 -0800 (PST) Received: (qmail 19181 invoked by uid 66); 14 Feb 2003 02:19:21 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 14 Feb 2003 02:19:21 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-1613d); 14 Feb 2003 03:19:14 +0100 Date: 14 Feb 2003 03:14:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8fpGbcs3cDD@3247.org> In-Reply-To: <200302131950.55716@sendmail.mutz.com> Subject: Re: Open Issue: Stored strings vs. queries. User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-13-1613d MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Organization: KDE Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Marc Mutz schrieb/wrote: > An interesting issue is what is to be considered stored local-parts and > what is a queried local-part. > Obvious stored local-parts: > - MTA config > - address books (e.g. LDAP) There are two types of address books: . Authorative address books, which are run by the same authority as the MTA config files. These are cleraly ``stored''. . Personal address books, which contain queries (because they are used to match against addresses _created_ elsewhere). Otherwise, you could not enter an address that was created by someone using a newer version of the profile. > Obvious queries: > - address lookups (e.g. LDAP queries) > - SMTP commands > Non-obvious: > - Mail headers Again, clearly a ``query''. It makes use of names created elswhere. Otherwise, you could not send mail to an address that was created by someone using a newer version of the profile. The distinction between ``stored'' and ``query'' is an application of the Robustness Principle: Don't generate names with unassigned code points (=> ``stored'') but allow them if someone insists that such an address exists (=> ``query''). The terms ``stored'' and ``query'' are confusing if used for anything else than DNS (where the servers ``store'' the authorative list of names and everyone else ``queries'' these servers). Claus -- http://www.faerber.muc.de/ From owner-ietf-imaa Thu Feb 13 18:30:12 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E2UCe04798 for ietf-imaa-bks; Thu, 13 Feb 2003 18:30:12 -0800 (PST) Received: from [63.202.92.157] (adsl-63-202-92-157.dsl.snfc21.pacbell.net [63.202.92.157]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E2U7d04793; Thu, 13 Feb 2003 18:30:07 -0800 (PST) Mime-Version: 1.0 X-Sender: phoffman@mail.imc.org Message-Id: In-Reply-To: <1045187326.18524.TMDA@moriarty.gnomon.org.uk> References: <1045187326.18524.TMDA@moriarty.gnomon.org.uk> X-Habeas-SWE-1: winter into spring X-Habeas-SWE-2: brightly anticipated X-Habeas-SWE-3: like Habeas SWE (tm) X-Habeas-SWE-4: Copyright 2002 Habeas (tm) X-Habeas-SWE-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-SWE-6: email in exchange for a license for this Habeas X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this X-Habeas-SWE-9: mark in spam to . Date: Thu, 13 Feb 2003 18:30:05 -0800 To: Roy Badami , ietf-imaa@imc.org From: Paul Hoffman / IMC Subject: Re: Splitting and encoding of local-parts: some thoughts Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: At 1:48 AM +0000 2/14/03, Roy Badami wrote: >Some thoughts on where to split, and how to encode. > >(1) We could encode the whole dequoted localpart in one go. But then >we can't use punycode as the ACE, because it could produce multiple >consecutive dots, and I really don't think we want to go there. >(That's not to say we shouldn't consider this option with a different >ACE, which could still be a bootstring profile.) Um, punycode doesn't output dots. --Paul Hoffman, Director --Internet Mail Consortium From owner-ietf-imaa Thu Feb 13 18:39:38 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E2dcR05321 for ietf-imaa-bks; Thu, 13 Feb 2003 18:39:38 -0800 (PST) Received: from kathmandu.sun.com (kathmandu.sun.com [192.18.98.36]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E2dbd05317 for ; Thu, 13 Feb 2003 18:39:37 -0800 (PST) Received: from esunmail ([129.147.58.120]) by kathmandu.sun.com (8.9.3+Sun/8.9.3) with ESMTP id TAA28302 for ; Thu, 13 Feb 2003 19:39:40 -0700 (MST) Received: from xpa-fe1 (esunmail [129.147.58.120]) by edgemail1.Central.Sun.COM (iPlanet Messaging Server 5.2 HotFix 1.08 (built Dec 6 2002)) with ESMTP id <0HAA00FI5223LD@edgemail1.Central.Sun.COM> for ietf-imaa@imc.org; Thu, 13 Feb 2003 19:39:40 -0700 (MST) Received: from nifty-jr.west.sun.com ([129.153.12.95]) by mail.sun.net (iPlanet Messaging Server 5.2 HotFix 1.08 (built Dec 6 2002)) with ESMTPSA id <0HAA006I62215C@mail.sun.net> for ietf-imaa@imc.org; Thu, 13 Feb 2003 19:39:39 -0700 (MST) Date: Thu, 13 Feb 2003 18:39:21 -0800 From: Chris Newman Subject: Re: Another issue: quoting In-reply-to: <1045183773.18276.TMDA@moriarty.gnomon.org.uk> To: Roy Badami , ietf-imaa@imc.org Message-id: <2147483647.1045161561@nifty-jr.west.sun.com> MIME-version: 1.0 X-Mailer: Mulberry/3.0.0 (Mac OS X) Content-type: text/plain; charset=us-ascii; format=flowed Content-transfer-encoding: 7BIT Content-disposition: inline X-message-flag: Outlook: the best virus distribution system around References: <1045183773.18276.TMDA@moriarty.gnomon.org.uk> Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: begin quotation by Roy Badami on 2003/2/14 0:49 +0000: > I would therefore propose that we consider defining a grammar for > internationalized local parts based on the 2822 grammar. As a result, > any localpart that contains a non-ASCII character MUST NOT contain any > of the following: > > quoting (ie backlash or double-quote) > whitespace > any ascii character that is not permited unquoted. > > Opinions? While we should ban all use of obsolete 2822 grammer with non-ASCII chars, we should support the full richness of the legal to generate grammer for 2822 with non-ASCII chars for consistancy. The extra work is not that hard: 1. remove "@domain" from addr-spec 2. if LHS starts with double-quote, strip surrounding double-quotes and internal use of "\" as the quote for a quoted-pair. 3. apply IDN transformation. I would argue that step 2 is sufficiently simple that it's not worth banning potentially useful address syntaxes (like "Common Name"@domain). The reverse transformation for step 2 is a bit more complex: 2-reverse: if output of IDN decode contains any of: 2822-specials, US-ASCII whitespace, leading or trailing "." or embedded "..", then wrap with double quotes and apply quoted-chars where necessary. This is still simple compared to all the IDN gunk. - Chris From owner-ietf-imaa Thu Feb 13 18:39:14 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E2dEa05309 for ietf-imaa-bks; Thu, 13 Feb 2003 18:39:14 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E2dBd05303 for ; Thu, 13 Feb 2003 18:39:13 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jVkw-00073d-00 for ; Thu, 13 Feb 2003 18:39:14 -0800 Date: Fri, 14 Feb 2003 02:39:14 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: Splitting and encoding of local-parts: some thoughts Message-ID: <20030214023914.GC25048@nicemice.net> Reply-To: IETF IMAA list References: <1045187326.18524.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Paul Hoffman / IMC wrote: > Um, punycode doesn't output dots. It doesn't introduce them, but it will copy them from the input to the output, if they exist in the input. The first thing the Punycode encoder does is copy all ASCII characters appearing in the input directly to the output. AMC From owner-ietf-imaa Thu Feb 13 18:48:43 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E2mhd05655 for ietf-imaa-bks; Thu, 13 Feb 2003 18:48:43 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E2mfd05651 for ; Thu, 13 Feb 2003 18:48:41 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1E2mvKA018838 for ; Fri, 14 Feb 2003 02:48:57 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Question: full-width at From: Roy Badami Date: Fri, 14 Feb 2003 02:48:56 +0000 Message-ID: <1045190936.18837.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Question: is full-width-at the only character that contains the at-sign in its compatibility decomposition? If not, should we perhaps consider converting the entire e-mail address to NFKC before splitting at at-sign? Even if there are no other such characters currently defined in Unicode, should we consider doing this anyway, in case a future compatibility character has such as compatibility decomposition, or can/should we avoid the issue by tying ourselves to a particular version of Unicode? And I won't even ask what happens if you try and follow the at-sign with a combining accent. I'm assuming there are no precomposed characters of that form, so NFKC vs NFCD won't make a difference here... Actually, I will ask: assuming we just split at the at sign, the domain name will begin with a combining accent. How will IDNA treat this? -roy From owner-ietf-imaa Thu Feb 13 18:53:24 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E2rO005825 for ietf-imaa-bks; Thu, 13 Feb 2003 18:53:24 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E2rMd05821 for ; Thu, 13 Feb 2003 18:53:23 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1E2rgKA018882 for ; Fri, 14 Feb 2003 02:53:42 GMT To: ietf-imaa@imc.org CC: roy@gnomon.org.uk In-reply-to: <20030214023914.GC25048@nicemice.net> (ietf-imaa.amc+0@nicemice.net.RemoveThisWord) Subject: Re: Splitting and encoding of local-parts: some thoughts References: <1045187326.18524.TMDA@moriarty.gnomon.org.uk> <20030214023914.GC25048@nicemice.net> From: Roy Badami Date: Fri, 14 Feb 2003 02:53:42 +0000 Message-ID: <1045191222.18879.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: It doesn't introduce them, but it will copy them from the input to the output, if they exist in the input. The first thing the Punycode encoder does is copy all ASCII characters appearing in the input directly to the output. Indeed, so: encodes to Note that the input didn't contain any consecutive dots, but the output did. -roy From owner-ietf-imaa Thu Feb 13 18:59:29 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E2xTR05967 for ietf-imaa-bks; Thu, 13 Feb 2003 18:59:29 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E2xRd05963 for ; Thu, 13 Feb 2003 18:59:27 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1E2xlKA018905 for ; Fri, 14 Feb 2003 02:59:47 GMT To: Chris.Newman@Sun.COM CC: ietf-imaa@imc.org, roy@gnomon.org.uk In-reply-to: <2147483647.1045161561@nifty-jr.west.sun.com> (message from Chris Newman on Thu, 13 Feb 2003 18:39:21 -0800) Subject: Re: Another issue: quoting References: <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <2147483647.1045161561@nifty-jr.west.sun.com> Date: Fri, 14 Feb 2003 02:59:46 +0000 Message-ID: <1045191586.18900.TMDA@moriarty.gnomon.org.uk> From: Roy Badami X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: While we should ban all use of obsolete 2822 grammer with non-ASCII chars, we should support the full richness of the legal to generate grammer for 2822 with non-ASCII chars for consistancy. The extra work is not that hard: Oops, you're right, my bad. I had thought that local-part had to be a dot-atom, but that's not true. I'm going to have to read 2822 carefully, but can anyone confirm that roy.badami and "roy.badami" are still potentially distinct mailboxes in 2822, even neglecting obsolete deprecated stuff? If so, this will require great care on the part of IMAA... -roy From owner-ietf-imaa Thu Feb 13 19:08:17 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E38HX06103 for ietf-imaa-bks; Thu, 13 Feb 2003 19:08:17 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E38Ed06099 for ; Thu, 13 Feb 2003 19:08:15 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1E38YKA018949 for ; Fri, 14 Feb 2003 03:08:35 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk In-reply-to: <1045185140.18391.TMDA@moriarty.gnomon.org.uk> (message from Roy Badami on Fri, 14 Feb 2003 01:12:20 +0000) Subject: Re: Open Issue: Splitting of local-part into labels and where? References: <1045185140.18391.TMDA@moriarty.gnomon.org.uk> From: Roy Badami Date: Fri, 14 Feb 2003 03:08:34 +0000 Message-ID: <1045192114.18946.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: If we assume we are going to use punycode, then you prety much *have* to split at dots, otherwise you'd have cases where punycode generates strings containing multiple consecutive dots. Yes, you could make this valid by generating suitably quoting, but you'd be creating local parts that are (a) deprecated by 2822 and (b) stand a significant chance of not working on the current Internet. (See my separate post on quoting.) Oops, as has just been pointed out to me, quoting is valid within the legal-to-generate grammar of 2822. I would still voice the concern that it anything containing quoting is likely to be less than universally acceptable to currently deployed systems (and we therefore shouldn't generate it). -roy From owner-ietf-imaa Thu Feb 13 19:24:16 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E3OGf06468 for ietf-imaa-bks; Thu, 13 Feb 2003 19:24:16 -0800 (PST) Received: from mercury.ccil.org (mail@[192.190.237.100]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E3OFd06464 for ; Thu, 13 Feb 2003 19:24:15 -0800 (PST) Received: from cowan by mercury.ccil.org with local (Exim 3.35 #1 (Debian)) id 18jWHP-0000Dn-00; Thu, 13 Feb 2003 22:12:47 -0500 Subject: Re: Question: full-width at In-Reply-To: <1045190936.18837.TMDA@moriarty.gnomon.org.uk> from Roy Badami at "Feb 14, 2003 02:48:56 am" To: Roy Badami Date: Thu, 13 Feb 2003 22:12:47 -0500 (EST) CC: ietf-imaa@imc.org X-Mailer: ELM [version 2.4ME+ PL66 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: From: John Cowan Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami scripsit: > Question: is full-width-at the only character that contains the > at-sign in its compatibility decomposition? No, there is also U+FE6B, the SMALL COMMERCIAL AT. It's fullwidth but standard size, with lots of kerning around it. U+FF20 is fullwidth and large-sized. > And I won't even ask what happens if you try and follow the at-sign > with a combining accent. I'm assuming there are no precomposed > characters of that form, so NFKC vs NFCD won't make a difference > here... There are none, fortunately. -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_ From owner-ietf-imaa Thu Feb 13 19:31:12 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E3VC106642 for ietf-imaa-bks; Thu, 13 Feb 2003 19:31:12 -0800 (PST) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1E3VBd06638 for ; Thu, 13 Feb 2003 19:31:11 -0800 (PST) Received: (qmail 65924 invoked by uid 1016); 14 Feb 2003 03:31:40 -0000 Date: 14 Feb 2003 03:31:40 -0000 Message-ID: <20030214033140.65923.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: ietf-imaa@imc.org Subject: Re: Another issue: quoting References: <1045183773.18276.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami writes: > One of the problems with the 822 specification is that localparts > aren't just a sequence of characters, with quoting being just a > transport encoding. False. RFC 822, section 3.4.4, specifies beyond a shadow of a doubt that quoting _is_ just a transport encoding. Read it! Disclaimers: 1. I agree that there are many serious problems in RFC 822. 2. I agree that what you falsely accuse RFC 822 of doing would have been another problem---if it were true. 3. None of this is meant as a comment on what RFC 2822 says. 4. None of this is meant as a comment on what software actually does. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-ietf-imaa Thu Feb 13 19:49:24 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E3nOE07237 for ietf-imaa-bks; Thu, 13 Feb 2003 19:49:24 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E3nNd07233 for ; Thu, 13 Feb 2003 19:49:23 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jWqs-0007Hb-00 for ; Thu, 13 Feb 2003 19:49:26 -0800 Date: Fri, 14 Feb 2003 03:49:26 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: Compatibility with IDNA Message-ID: <20030214034926.GA27030@nicemice.net> Reply-To: IETF IMAA list References: <8f$3A5e3cDD@3247.org> <20030211023401.GE16359@nicemice.net> <20030213040028.GA9630@nicemice.net> <20030213080759.GA18181@nicemice.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030213080759.GA18181@nicemice.net> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Reminder of context: This is a continuation of the thread exploring the question "What would it take to use the same ToASCII/ToUnicode operations for both IMAA and IDNA?" I wrote that IMAA could use the IDNA ToASCII if it didn't allow quite as much flexibility in the case of the output: > If the input of ToASCII contains no uppercase characters, then the > output of ToASCII must contain no uppercase characters. I had the right idea, but I didn't express it quite correctly. The above wording fails to account for titlecase characters, or for uppercase characters that have no lowercase mates (like Cyrillic letter palochka and all the Georgian capital letters). What I meant was: Definition: String X is "canonical case" iff CaseFold(X) = X. Constraint: If the input of ToASCII is canonical case, then the output of ToASCII must also be canonical case. I then explained that IMAA would need to impose the constraint on ToUnicode too. But earlier I had said that IMAA and IDNA could use the same prefix only if they used the exact same ToUnicode. Oops! How bad is this? Not a disaster, I think. The danger of using the same prefix is that the wrong ToUnicode operation might get applied. I can think of two ways this could happen: 1) Implementation error. A program applies the IDNA ToUnicode operation to a local part, in blatant violation of the standards. A different prefix would protect against this blunder, but one could argue that we as spec designers are not obligated to provide such protection, if there are other advantages to be gained by not providing it. 2) Copying local parts into domain names, or vice-versa. For example, DNS SOA records. Now a program might innocently use the ToUnicode operation appropriate for the data type where the string now sits, not knowing that it was actually copied from a different data type. In fact, this is one of the main reasons for reusing ToUnicode, so that things might get displayed intelligibly even when copied from one side of the at-sign to the other. Regardless of how the wrong ToUnicode gets applied, what damage can result? If the IMAA ToUnicode is used on a domain label, there is no problem, because the IMAA ToUnicode is just a more constrained version of the IDNA ToUnicode. If the IDNA ToUnicode is used on a local part, the result could be wrong (sending mail to it would bounce), but only if the mail server is case-sensitive and the ToUnicode implementation gratuitously uppercases some of the output characters even though its input was all lowercase ASCII. I can imagine a ToUnicode implementation that uppercases some of its output characters when some of the input characters are uppercase ASCII (that's how case preservation via mixed-case annotations would work), but I cannot imagine what would posess a ToUnicode implementation to go to the extra effort of uppercasing some of its output characters given an all lowercase ASCII input. Not only is the risk small, but it's a risk that already exists today with ASCII local parts that are copied into domain names. Whereas applications "must" preserve the case of ASCII local parts, they merely "should" preserve the case of ASCII domain labels. If the local part of foo@example.org is case-sensitive, and I put foo.example.org in an SOA record, there is a risk that that the domain name will somehow fall into the hands of a program that gratuitously capitalizes domain names, yielding FOO.EXAMPLE.ORG. When this is eventually converted back to a mail address, FOO@EXAMPLE.ORG, it won't work. In summary, I still think that using the same ToASCII & ToUnicode operations, including the same profile and prefix, for both domain labels and local parts, is a viable option worth considering. I haven't decided whether I think it's the best option. AMC From owner-ietf-imaa Thu Feb 13 20:40:56 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E4euQ08567 for ietf-imaa-bks; Thu, 13 Feb 2003 20:40:56 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E4etd08563 for ; Thu, 13 Feb 2003 20:40:55 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jXel-0007PR-00; Thu, 13 Feb 2003 20:40:59 -0800 Date: Fri, 14 Feb 2003 04:40:59 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Cc: Roy Badami Subject: Re: Another issue: quoting Message-ID: <20030214044058.GE25048@nicemice.net> Reply-To: IETF IMAA list , Roy Badami References: <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <20030214033140.65923.qmail@cr.yp.to> <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <2147483647.1045161561@nifty-jr.west.sun.com> <1045191586.18900.TMDA@moriarty.gnomon.org.uk> <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <2147483647.1045161561@nifty-jr.west.sun.com> <1045183773.18276.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030214033140.65923.qmail@cr.yp.to> <1045191586.18900.TMDA@moriarty.gnomon.org.uk> <2147483647.1045161561@nifty-jr.west.sun.com> <1045183773.18276.TMDA@moriarty.gnomon.org.uk> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami wrote: > Quoting in local parts is a messy construct. Yes. Quoting in general is inherently messy, and the quoting in message headers is no exception. > One of the problems with the 822 specification is that localparts > aren't just a sequence of characters, with quoting being just a > transport encoding. Localparts are a sequence of tokens, namely atoms > and dots. > > By way of example, it seems to me that the RFC822 parse of the > following two localparts is distinct, and as a result they could > validly refer to distinct mailboxes: > > roy.badami > "roy.badami" I had these exact same concerns two months ago, and careful reading of RFC 822 clarified some things. D. J. Bernstein has already pointed to section 3.4.4 of RFC 822, which explains that the backslashes used to quote the next character, and the quotation marks around quoted-strings, are not part of the data, and should not be retained outside a message-header context. As for the atom/dot structure, section 6.2.4 explains that it is not significant. So in your example above, roy.badami and "roy.badami" do in fact refer to the same mailbox, even though they parse differently. > In fact, the definition of "dequoted local part" in the base document > is problematic as it stands, because in 822 there isn't any such > thing. I don't see what's wrong with defining a new term. Actually, the definition of "dequoted local part" is merely giving a name to a concept described (but not named) in the last paragraph of section 3.4.4 of RFC 822. AMC From owner-ietf-imaa Thu Feb 13 21:07:45 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E57jX09063 for ietf-imaa-bks; Thu, 13 Feb 2003 21:07:45 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E57id09059 for ; Thu, 13 Feb 2003 21:07:44 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jY4i-0007Tk-00; Thu, 13 Feb 2003 21:07:48 -0800 Date: Fri, 14 Feb 2003 05:07:48 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Cc: Roy Badami Subject: Re: Question: full-width at Message-ID: <20030214050748.GF25048@nicemice.net> Reply-To: IETF IMAA list , Roy Badami References: <1045190936.18837.TMDA@moriarty.gnomon.org.uk> <1045190936.18837.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1045190936.18837.TMDA@moriarty.gnomon.org.uk> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy> Question: is full-width-at the only character that contains the Roy> at-sign in its compatibility decomposition? John> No, there is also U+FE6B, the SMALL COMMERCIAL AT. Roy> If not, should we perhaps consider converting the entire e-mail Roy> address to NFKC before splitting at at-sign? Roy> And I won't even ask what happens if you try and follow the at-sign Roy> with a combining accent. Roy> Actually, I will ask: assuming we just split at the at sign, the Roy> domain name will begin with a combining accent. How will IDNA treat Roy> this? These issues were considered for IDNA. There are many characters that decompose to dot (like U+FE52 small full stop, U+2024 one dot leader) or decompose to strings containing dots (like U+2488 digit one full stop, U+33C2 square AM). Therefore, if a whole domain name is normalized before being scanned for dots, it might result in a different number of labels than if it had not been normalized. We discussed whether IDNA should require this pre-normalization. Ultimately we decided that that was getting too far into user interface issues. IDNA instead focuses almost entirely on individual labels, not whole domain names. The one exception is to require that three particular dot-like non-ASCII characters be recognized as dots, just the three that we had reason to believe would be likely to be input by users trying to type dots. The IMAA draft follows this example. It says as little as possible about entire mail addresses, and focuses on the local part. The one exception is to require one particular at-like character to be recognized as an at-sign, the one that we have reason to believe is likely to be input by users trying to type at-signs. As for initial combining characters, the same question arose with IDNA. If a label begins with a combining character, will it combine with the preceeding dot? This was considered a user-interface issue that IDNA should not address. AMC From owner-ietf-imaa Thu Feb 13 22:50:30 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1E6oU911516 for ietf-imaa-bks; Thu, 13 Feb 2003 22:50:30 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1E6oTd11512 for ; Thu, 13 Feb 2003 22:50:29 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jZgA-0007i2-00; Thu, 13 Feb 2003 22:50:34 -0800 Date: Fri, 14 Feb 2003 06:50:34 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Cc: Roy Badami Subject: Re: Open Issue: Splitting of local-part into labels and where? Message-ID: <20030214065034.GG25048@nicemice.net> Reply-To: IETF IMAA list , Roy Badami References: <1045185140.18391.TMDA@moriarty.gnomon.org.uk> <1045192114.18946.TMDA@moriarty.gnomon.org.uk> <1045185140.18391.TMDA@moriarty.gnomon.org.uk> <200302132143.49081@sendmail.mutz.com> <8fpGcCiZcDD@3247.org> <4.2.0.58.J.20030213142550.03349d90@localhost> <200302132143.49081@sendmail.mutz.com> <200302131948.20807@sendmail.mutz.com> <4.2.0.58.J.20030213142550.03349d90@localhost> <200302131948.20807@sendmail.mutz.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8fpG$x4JcDD@3247.org> <1045188532.18629.TMDA@moriarty.gnomon.org.uk> <1045192114.18946.TMDA@moriarty.gnomon.org.uk> <1045185140.18391.TMDA@moriarty.gnomon.org.uk> <8fpGcCiZcDD@3247.org> <200302132143.49081@sendmail.mutz.com> <4.2.0.58.J.20030213142550.03349d90@localhost> <200302131948.20807@sendmail.mutz.com> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: This message responds to messages by Marc Mutz, Martin Duerst, Roy Badami, and Claus Färber. Marc Mutz wrote: > I think that splitting into labels is needed to keep old software > (esp. MTAs and filtering software) working. I can't add more to Roy's > arguments here, and I feel there will be not much discussion about > this particular issue. That's optimistic! :) > The more interesting point is _where_ to split. There are some very > obvious characters (mainly full stops and hyphens), It would be nice to split at hyphens, but it would also be nice to share the ToASCII/ToUnicode operations between IDNA and IMAA. We can't have both. > The draft also mentions the option of using all non-alnum US-ASCII > characters. If the number of equivalent Unicode code points is small, > then this is certainly the best option, although we should then > provide a mapping table for the to-usascii mapping. It might be preferable to simply require applying NFKC to the entire local part before splitting it. It's a more heavyweight operation, but it's already in the toolbox. > the draft specifies splitting local-part and domain at _the_ at-sign. I don't think the draft says that. It says: In an internationalized mail address, the following characters MUST be recognized as at-signs for separating the local part from the domain name: U+0040 (commercial at), U+FF20 (fullwidth commercial at). This is not intended to be a full specification of how to parse mail addresses; see other RFCs for that. This is merely intended to expand the set of characters that can be used to separate the local part from the domain name. There are other places where the phrase "the at-sign" is used. It is meant as shorthand for "the at-sign that separates the local part from the domain name". If that's not sufficiently clear from context, we could be more explicit. Martin Duerst wrote: > Making sure that a user enters a '@' in the right form can and should > be a quality of implementation issue under the responsibility of the > application. The same argument could be made about fullwidth full stop in IDNA. There were arguments on both sides, and ultimately we chose to require that all IDNA-aware applications recognize a few characters as dots. If IMAA is to be a natural follow-on to IDNA, it ought to handle the at-sign the same way. By the way, one of the arguments for the dot requirement was the following scenario: You and I both have IDNA-aware applications, and I can type a domain name into mine and it works, but when I type it into the body of a message and mail it to you, and you paste it into your application, it fails, because yours doesn't recognize the same dots as mine. We don't try to 100% solve that problem, but standardizing the few most common variants of the essential delimiters is an easy 99% solution. Marc Mutz wrote: > We will not be able to use the same function for the LHS as for the > RHS anyway. That's simply b/c there are characters that are allowed > in local-parts, but not in domains The IDNA ToASCII and ToUnicode operations are designed to handle all ASCII characters just fine, because domain names in general can contain all ASCII characters, although host names and mail domains are restricted to letters, digits, hyphens, and the dot separators. In another thread I have outlined a scheme for reusing the IDNA ToASCII and ToUnicode in IMAA. If you think it wouldn't work, please explain. > it's only logical to split at non-alphanum ASCII characters. I agree. Except we'd have to make sure that we can reverse the process. For example, if we encode each piece in such a way that the encoding can introduce hyphens, then we better not split on hyphens! If we don't reuse the IDNA operations, then it's possible to use an encoding that uses only alphanumerics, and we can split on all non-alphanumeric ASCII characters. > We should, of course, exclude pathological cases, such as control > characters Huh? If control characters are not considered delimiters, then that leaves them as part of the pieces that get encoded, which is even worse. If they're considered delimiters, then we don't mess with them at all. I think "*all* non-alphanumeric ASCII characters" is the right target. Roy Badami wrote: > If we assume we are going to use punycode, then you prety much *have* > to split at dots, otherwise you'd have cases where punycode generates > strings containing multiple consecutive dots. Yes, you could make > this valid by generating suitable quoting, but you'd be creating local > parts that stand a significant chance of not working on the current > Internet. That's a very keen observation. I don't know how afraid of quoting we should really be, but it's good to be aware that non-ASCII forms that don't need quotes can map to ACE forms that do need quotes. I hadn't really noticed that. > I don't think it's clear at this stage that the implentational > convenience of using the same function justifies the cost to the > utility of the specification. Implementational convenience is not the only reason to use the same operations. Another reason is so that when domains are copied into local parts and vice-versa, they might be displayed intelligibly. How well that would or wouldn't work depends on whether local parts are subdivided. > Actually, I'm not even convinced at this point that it's clear that we > should use punycode for the ACE, eg if we choose to ACE-encode strings > containing dots, and require the result not to contain consecutive > dots. > > Adam, what comes after AMC-ACE-Z ? :) We can consider different prefixes, and different profiles, but I doubt that anyone wants to see another encoding. It just gets to be too much. People might be willing to accept a simple wrapper around Punycode. For example, in order to use hyphen as a delimiter we would need to eliminate hyphens from the Punycode encoding, which could be done by applying Punycode itself, then replacing the hyphen with the ACE infix (which would be purely alphanumeric) (or prepend the infix if there is no hyphen). > There may of course be quoted at-signs in atoms on either side of the > unquoted at-sign Actually, atoms cannot contain at-signs. Quoted-strings can, but they're not permitted after the at-sign. Claus Färber wrote: [regarding the set of delimiters for subdividing local parts] > I'm not quite sure about the quoting characters ``"'' and ``\''. If > we apply toASCII/toUnicode after dequoting the local part, there's no > reason to include them. Sure there is. Consider: From: foo bar <"foo\"bar\\"@example.org> The dequoted local part is: foo"bar\ In any case, I don't see the point of trying to justify each delimiter individually. I think its easier to justify a simple policy like "all non-alphanumeric ASCII characters" or "all ASCII characters that Punycode doesn't generate" (alphanumerics and hyphen). AMC From owner-ietf-imaa Fri Feb 14 02:01:25 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EA1Pf12223 for ietf-imaa-bks; Fri, 14 Feb 2003 02:01:25 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EA1Md12210 for ; Fri, 14 Feb 2003 02:01:22 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EA1dKA020809 for ; Fri, 14 Feb 2003 10:01:39 GMT To: ietf-imaa@imc.org, roy@gnomon.org.uk CC: ietf-imaa@imc.org In-reply-to: <20030214044058.GE25048@nicemice.net> (ietf-imaa.amc+0@nicemice.net.RemoveThisWord) Subject: Re: Another issue: quoting References: <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <20030214033140.65923.qmail@cr.yp.to> <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <2147483647.1045161561@nifty-jr.west.sun.com> <1045191586.18900.TMDA@moriarty.gnomon.org.uk> <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <2147483647.1045161561@nifty-jr.west.sun.com> <20030214044058.GE25048@nicemice.net> Date: Fri, 14 Feb 2003 10:01:37 +0000 Message-ID: <1045216897.20793.TMDA@moriarty.gnomon.org.uk> From: Roy Badami X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I had these exact same concerns two months ago, and careful reading of RFC 822 clarified some things. D. J. Bernstein has already pointed to section 3.4.4 of RFC 822, which explains that the backslashes used to quote the next character, and the quotation marks around quoted-strings, are not part of the data, and should not be retained outside a message-header context. As for the atom/dot structure, section 6.2.4 explains that it is not significant. So in your example above, roy.badami and "roy.badami" do in fact refer to the same mailbox, even though they parse differently. Thanks for the reference to 6.2.4. It appears that I'm just plain wrong here. I don't see what's wrong with defining a new term. Not worth persuing this line of discussion; my comment was based on a misapprehension of what RFC822 said. -roy From owner-ietf-imaa Fri Feb 14 02:01:25 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EA1Pe12222 for ietf-imaa-bks; Fri, 14 Feb 2003 02:01:25 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EA1Md12209 for ; Fri, 14 Feb 2003 02:01:22 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EA1cKA020794 for ; Fri, 14 Feb 2003 10:01:38 GMT To: ietf-imaa@imc.org, roy@gnomon.org.uk CC: ietf-imaa@imc.org In-reply-to: <20030214044058.GE25048@nicemice.net> (ietf-imaa.amc+0@nicemice.net.RemoveThisWord) Subject: Re: Another issue: quoting References: <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <20030214033140.65923.qmail@cr.yp.to> <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <2147483647.1045161561@nifty-jr.west.sun.com> <1045191586.18900.TMDA@moriarty.gnomon.org.uk> <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <2147483647.1045161561@nifty-jr.west.sun.com> <20030214044058.GE25048@nicemice.net> From: Roy Badami Date: Fri, 14 Feb 2003 10:01:37 +0000 Message-ID: <1045216897.20793.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I had these exact same concerns two months ago, and careful reading of RFC 822 clarified some things. D. J. Bernstein has already pointed to section 3.4.4 of RFC 822, which explains that the backslashes used to quote the next character, and the quotation marks around quoted-strings, are not part of the data, and should not be retained outside a message-header context. As for the atom/dot structure, section 6.2.4 explains that it is not significant. So in your example above, roy.badami and "roy.badami" do in fact refer to the same mailbox, even though they parse differently. Thanks for the reference to 6.2.4. It appears that I'm just plain wrong here. I don't see what's wrong with defining a new term. Not worth persuing this line of discussion; my comment was based on a misapprehension of what RFC822 said. -roy From owner-ietf-imaa Fri Feb 14 02:31:28 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EAVS416548 for ietf-imaa-bks; Fri, 14 Feb 2003 02:31:28 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EAVRd16542 for ; Fri, 14 Feb 2003 02:31:27 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jd7v-0008Av-00; Fri, 14 Feb 2003 02:31:27 -0800 Date: Fri, 14 Feb 2003 10:31:27 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Cc: Roy Badami Subject: Re: Splitting and encoding of local-parts: some thoughts Message-ID: <20030214103127.GC30815@nicemice.net> Reply-To: IETF IMAA list , Roy Badami References: <1045187326.18524.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1045187326.18524.TMDA@moriarty.gnomon.org.uk> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami wrote: > (1) We could encode the whole dequoted localpart in one go. But then > we can't use punycode as the ACE, because it could produce multiple > consecutive dots, and I really don't think we want to go there. I'm not sure we need to be quite so fearful of quoting, but we should certainly keep this in mind as we weigh the options. > (2) We could simply split at dots, and encode using punycode as the > ACE. ...if dots are commonplace in internationalized localparts, the > extra ACE prefixes would eat into our length limit. > > (3) split at all non-alphanumeric ASCII characters. ...this will eat > into our length limit further. Yes, we need to weigh the benefit of being friendly to structured local part conventions (like user+tag@domain) and the benefit of IDNs copied into local parts being displayed intelligibly (like user%domain@domain or listname-return.user=domain@domain) versus the cost of extra ACE prefixes, and the cost of the extra subdividing step. > I'd also like to throw a slightly wacky idea out there: if we accept > that any prefixes and suffixes applied to (or stripped off from) > e-mail addresses are restricted to ASCII, then it is adequate to > identify the substring of the localpart from the first non-ASCII > character to the last non-ASCII character, and encode and mark that in > some way. > > Is it reasonable to restrict users of localpart suffixes (delimited by > plus or minus) to restrict their suffixes to ASCII characters? I would think that anyone who wants to use the user+tag convention, and wants to use non-ASCII characters in the user part, probably also wants to use non-ASCII characters in the tag part. AMC From owner-ietf-imaa Fri Feb 14 03:10:56 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EBAum21164 for ietf-imaa-bks; Fri, 14 Feb 2003 03:10:56 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EBAsd21156 for ; Fri, 14 Feb 2003 03:10:54 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EBBDKA021114 for ; Fri, 14 Feb 2003 11:11:13 GMT To: ietf-imaa@imc.org, roy@gnomon.org.uk CC: ietf-imaa@imc.org In-reply-to: <20030214103127.GC30815@nicemice.net> (ietf-imaa.amc+0@nicemice.net.RemoveThisWord) Subject: Re: Splitting and encoding of local-parts: some thoughts References: <1045187326.18524.TMDA@moriarty.gnomon.org.uk> <20030214103127.GC30815@nicemice.net> Date: Fri, 14 Feb 2003 11:11:12 +0000 Message-ID: <1045221072.21097.TMDA@moriarty.gnomon.org.uk> From: Roy Badami X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I'm not sure we need to be quite so fearful of quoting, but we should certainly keep this in mind as we weigh the options. I think if we generate quoting there will be MUAs out there that can't cope with it. I certainly remember that when X.400 was popular in Britain, people using RFC1148 mapped addresses had problems communicating with some people. -roy From owner-ietf-imaa Fri Feb 14 03:10:56 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EBAuH21163 for ietf-imaa-bks; Fri, 14 Feb 2003 03:10:56 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EBArd21152 for ; Fri, 14 Feb 2003 03:10:54 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EBBCKA021100 for ; Fri, 14 Feb 2003 11:11:12 GMT To: ietf-imaa@imc.org, roy@gnomon.org.uk CC: ietf-imaa@imc.org In-reply-to: <20030214103127.GC30815@nicemice.net> (ietf-imaa.amc+0@nicemice.net.RemoveThisWord) Subject: Re: Splitting and encoding of local-parts: some thoughts References: <1045187326.18524.TMDA@moriarty.gnomon.org.uk> <20030214103127.GC30815@nicemice.net> From: Roy Badami Date: Fri, 14 Feb 2003 11:11:12 +0000 Message-ID: <1045221072.21097.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: I'm not sure we need to be quite so fearful of quoting, but we should certainly keep this in mind as we weigh the options. I think if we generate quoting there will be MUAs out there that can't cope with it. I certainly remember that when X.400 was popular in Britain, people using RFC1148 mapped addresses had problems communicating with some people. -roy From owner-ietf-imaa Fri Feb 14 03:37:26 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EBbQZ22297 for ietf-imaa-bks; Fri, 14 Feb 2003 03:37:26 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EBbOd22293 for ; Fri, 14 Feb 2003 03:37:24 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EBbgKA021248 for ; Fri, 14 Feb 2003 11:37:43 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Question: UseSTD3ASCIIRules on RHS of IMA From: Roy Badami Date: Fri, 14 Feb 2003 11:37:42 +0000 Message-ID: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Should IMAA make any statement as to whether UseSTD3ASCIIRules MUST or SHOULD be set when processessing any IDN on the RHS of an IMA, or should this decision be left up to the implementor? I'd suggest that IMAA should at least make a recommendation on this, to ensure consistency in the handling of IMAs between implementations. -roy From owner-ietf-imaa Fri Feb 14 04:06:51 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EC6pQ23251 for ietf-imaa-bks; Fri, 14 Feb 2003 04:06:51 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EC6nd23246 for ; Fri, 14 Feb 2003 04:06:49 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EC78KA021359 for ; Fri, 14 Feb 2003 12:07:08 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Question: Fullwidth double-quote and fullwidth backslash From: Roy Badami Date: Fri, 14 Feb 2003 12:07:08 +0000 Message-ID: <1045224428.21358.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: When dequoting/requoting localparts, should we consider recognizing fullwidth double quotes and fullwidth backslash (and any other double-quote-like and backlash-like characters)? It seems to me that the arguments for this are similar to those for fullwidth dot and fullwidth at, and once we decide to recognize metacharacters in fullwidth form, we should apply this consistently to *all* metacharacters. -roy From owner-ietf-imaa Fri Feb 14 05:05:26 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1ED5Q625022 for ietf-imaa-bks; Fri, 14 Feb 2003 05:05:26 -0800 (PST) Received: from mail.uni-bielefeld.de (IDENT:72@mail2.uni-bielefeld.de [129.70.4.90]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1ED5Od25018 for ; Fri, 14 Feb 2003 05:05:24 -0800 (PST) Received: from dirichlet.Physik.Uni-Bielefeld.DE (dirichlet.Physik.Uni-Bielefeld.DE [129.70.125.234]) by mail.uni-bielefeld.de (Sun Internet Mail Server sims.4.0.2000.10.12.16.25.p8) with ESMTP id <0HAA00M4JV0WDB@mail.uni-bielefeld.de> for ietf-imaa@imc.org; Fri, 14 Feb 2003 14:05:20 +0100 (MET) Date: Fri, 14 Feb 2003 13:52:24 +0100 From: Marc Mutz Subject: Re: Open Issue: Splitting of local-part into labels and where? In-reply-to: <20030214065034.GG25048@nicemice.net> To: ietf-imaa@imc.org Message-id: <200302141352.49408@sendmail.mutz.com> Organization: KDE MIME-version: 1.0 Content-type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_gaOT+4am4xdI/Kv"; charset="iso-8859-1" Content-transfer-encoding: 7bit User-Agent: KMail/1.5.9 X-PGP-Key: 0xBDBFE838 References: <1045185140.18391.TMDA@moriarty.gnomon.org.uk> <200302131948.20807@sendmail.mutz.com> <20030214065034.GG25048@nicemice.net> Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --Boundary-02=_gaOT+4am4xdI/Kv Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Description: signed data Content-Disposition: inline On Friday 14 February 2003 07:50, Adam M. Costello wrote: > It might be preferable to simply require applying NFKC to the entire > local part before splitting it. It's a more heavyweight operation, > but it's already in the toolbox. Completely agreed. Marc =2D-=20 The [Sonny Bono Copyright Term Extension Act] expands copyright not only for future, but also for existing works, even though their authors obviously don't need any additional incentive to create them. -- "The Progress Of Science And Useful Arts": Why Copyright Today Threatens Intellectual Freedom, Free Expression Policy Project --Boundary-02=_gaOT+4am4xdI/Kv Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQA+TOag3oWD+L2/6DgRAtHgAJ45taH/Kr+73Gt+lnjCalw+71meRACfZkIJ IL6M4+jd9GIDVDEWqSu4A9Q= =66sK -----END PGP SIGNATURE----- --Boundary-02=_gaOT+4am4xdI/Kv-- From owner-ietf-imaa Fri Feb 14 05:52:20 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EDqKK27647 for ietf-imaa-bks; Fri, 14 Feb 2003 05:52:20 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1EDqGd27632 for ; Fri, 14 Feb 2003 05:52:16 -0800 (PST) Received: (qmail 27000 invoked by uid 66); 14 Feb 2003 13:52:14 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 14 Feb 2003 13:52:14 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-14-1342d); 14 Feb 2003 14:52:04 +0100 Date: 14 Feb 2003 14:43:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8fpJ5tRZcDD@3247.org> In-Reply-To: <1045183773.18276.TMDA@moriarty.gnomon.org.uk> Subject: Re: Another issue: quoting User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-14-1342d MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami schrieb/wrote: > Most software wants to treat the localpart as simply a sequence of > characters (at least after dequoting) but it isn't defined that way. To be quirks-compatible with software that does make a distinction between different forms of the same local-part, I'd recommend the following: . Save the original form. . Fully dequote/remove comments/extra whitespace before toUnicode and/or toASCII. . If the output of toUnicode/toASCII is identical to the input, use the original form in step #1, otherwise use the output generated by the function. Claus -- http://www.faerber.muc.de/ From owner-ietf-imaa Fri Feb 14 05:52:20 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EDqK127646 for ietf-imaa-bks; Fri, 14 Feb 2003 05:52:20 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1EDqGd27633 for ; Fri, 14 Feb 2003 05:52:16 -0800 (PST) Received: (qmail 26999 invoked by uid 66); 14 Feb 2003 13:52:14 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 14 Feb 2003 13:52:14 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-14-1342d); 14 Feb 2003 14:52:04 +0100 Date: 14 Feb 2003 14:33:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8fpJ5$qocDD@3247.org> In-Reply-To: <1045224428.21358.TMDA@moriarty.gnomon.org.uk> Subject: Re: Question: Fullwidth double-quote and fullwidth backslash User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-14-1342d MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami schrieb/wrote: > When dequoting/requoting localparts, should we consider recognizing > fullwidth double quotes and fullwidth backslash (and any other > double-quote-like and backlash-like characters)? Just do a NFKC normalisation at the very beginning and then additinally map U+3002 to U+002E. This will handle all of these special cases. Claus -- http://www.faerber.muc.de/ From owner-ietf-imaa Fri Feb 14 05:52:20 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EDqKf27645 for ietf-imaa-bks; Fri, 14 Feb 2003 05:52:20 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1EDqGd27634 for ; Fri, 14 Feb 2003 05:52:16 -0800 (PST) Received: (qmail 26998 invoked by uid 66); 14 Feb 2003 13:52:14 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 14 Feb 2003 13:52:14 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-14-1342d); 14 Feb 2003 14:52:04 +0100 Date: 14 Feb 2003 14:07:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8fpJ4yM3cDD@3247.org> In-Reply-To: <20030214065034.GG25048@nicemice.net> Subject: Re: Open Issue: Splitting of local-part into labels and where? User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-14-1342d MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Adam M. Costello schrieb/wrote: > Claus Färber wrote: > [regarding the set of delimiters for subdividing local parts] >> I'm not quite sure about the quoting characters ``"'' and ``\''. If >> we apply toASCII/toUnicode after dequoting the local part, there's no >> reason to include them. > Sure there is. Consider: > From: foo bar <"foo\"bar\\"@example.org> > The dequoted local part is: > foo"bar\ So what? They're just ordinary characters like ``a'', ``b'' and ``c'' then. The quote characters are rarely used as separators. The dequoted local part ``foö"bär\'' could be encoded as ``xn--fo"br\-eua5l'', which could be quoted/included in a From header as ``"xn--fo\"br\\-eua5l" (comment allowed by RFC 822) @example.org''. For decoding, you would first decode the local part, apply toUnicode and then re-apply a quoting for display. Claus -- http://www.faerber.muc.de/ From owner-ietf-imaa Fri Feb 14 05:49:37 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EDnbR27547 for ietf-imaa-bks; Fri, 14 Feb 2003 05:49:37 -0800 (PST) Received: from mail.uni-bielefeld.de (IDENT:72@mail2.uni-bielefeld.de [129.70.4.90]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EDnZd27543 for ; Fri, 14 Feb 2003 05:49:35 -0800 (PST) Received: from dirichlet.Physik.Uni-Bielefeld.DE (dirichlet.Physik.Uni-Bielefeld.DE [129.70.125.234]) by mail.uni-bielefeld.de (Sun Internet Mail Server sims.4.0.2000.10.12.16.25.p8) with ESMTP id <0HAA00161X2F0S@mail.uni-bielefeld.de> for ietf-imaa@imc.org; Fri, 14 Feb 2003 14:49:28 +0100 (MET) Date: Fri, 14 Feb 2003 14:36:44 +0100 From: Marc Mutz Subject: Re: Another issue: quoting In-reply-to: <1045191586.18900.TMDA@moriarty.gnomon.org.uk> To: ietf-imaa@imc.org Message-id: <200302141436.55925@sendmail.mutz.com> Organization: KDE MIME-version: 1.0 Content-type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_3DPT+1EFGU+k7dQ"; charset="us-ascii" Content-transfer-encoding: 7bit User-Agent: KMail/1.5.9 X-PGP-Key: 0xBDBFE838 References: <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <2147483647.1045161561@nifty-jr.west.sun.com> <1045191586.18900.TMDA@moriarty.gnomon.org.uk> Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --Boundary-02=_3DPT+1EFGU+k7dQ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Content-Description: signed data Content-Disposition: inline On Friday 14 February 2003 03:59, Roy Badami wrote: > I'm going to have to read 2822 carefully, but can anyone confirm that > roy.badami and "roy.badami" are still potentially distinct mailboxes > in 2822, even neglecting obsolete deprecated stuff? After lexing, they are equivalent. Marc =2D-=20 This is as small as I think is sensible. -- Don Sanders after committing a 1MB patch to KMail CVS --Boundary-02=_3DPT+1EFGU+k7dQ Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQA+TPD33oWD+L2/6DgRArYnAJ9kIRXFd0cDQADzZ3Fr6s+B4lwFgQCg+hlq DNwIW1XHJAcPuJ9hur2POx4= =KoqQ -----END PGP SIGNATURE----- --Boundary-02=_3DPT+1EFGU+k7dQ-- From owner-ietf-imaa Fri Feb 14 06:09:54 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EE9sl28039 for ietf-imaa-bks; Fri, 14 Feb 2003 06:09:54 -0800 (PST) Received: from yxa.extundo.com (178.230.13.217.in-addr.dgcsystems.net [217.13.230.178]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EE9qd28035 for ; Fri, 14 Feb 2003 06:09:52 -0800 (PST) Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) (authenticated bits=0) by yxa.extundo.com (8.12.7/8.12.7) with ESMTP id h1EE9gXf002461 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Fri, 14 Feb 2003 15:09:44 +0100 To: Roy Badami Cc: ietf-imaa@imc.org Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA X-Payment: hashcash 1.1 0:030214:roy@gnomon.org.uk:c5139691bad24d4e X-Hashcash: 0:030214:roy@gnomon.org.uk:c5139691bad24d4e X-Payment: hashcash 1.1 0:030214:ietf-imaa@imc.org:42ddbd6510bb9e54 X-Hashcash: 0:030214:ietf-imaa@imc.org:42ddbd6510bb9e54 From: Simon Josefsson Date: Fri, 14 Feb 2003 15:09:42 +0100 In-Reply-To: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> (Roy Badami's message of "Fri, 14 Feb 2003 11:37:42 +0000") Message-ID: User-Agent: Gnus/5.090016 (Oort Gnus v0.16) Emacs/21.3.50 References: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Status: No, hits=-2.8 required=5.0 tests=IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,SPAM_PHRASE_00_01, USER_AGENT,USER_AGENT_GNUS_UA version=2.44 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami writes: > Should IMAA make any statement as to whether UseSTD3ASCIIRules MUST or > SHOULD be set when processessing any IDN on the RHS of an IMA, or > should this decision be left up to the implementor? Why should IMAA discuss it at all? IDNA defines how domain names are internationalized, and that takes care of RHS. From owner-ietf-imaa Fri Feb 14 06:17:31 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EEHVU28381 for ietf-imaa-bks; Fri, 14 Feb 2003 06:17:31 -0800 (PST) Received: from yxa.extundo.com (178.230.13.217.in-addr.dgcsystems.net [217.13.230.178]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EEHUd28377 for ; Fri, 14 Feb 2003 06:17:30 -0800 (PST) Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) (authenticated bits=0) by yxa.extundo.com (8.12.7/8.12.7) with ESMTP id h1EEHTXf002600 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Fri, 14 Feb 2003 15:17:29 +0100 To: list-ietf-i18n-imaa@faerber.muc.de (Claus =?iso-8859-1?q?F=E4rber?=) Cc: ietf-imaa@imc.org Subject: Re: Question: Fullwidth double-quote and fullwidth backslash X-Payment: hashcash 1.1 0:030214:list-ietf-i18n-imaa@faerber.muc.de:eaede714d93ebaca X-Hashcash: 0:030214:list-ietf-i18n-imaa@faerber.muc.de:eaede714d93ebaca X-Payment: hashcash 1.1 0:030214:ietf-imaa@imc.org:60732d59cfd3dbea X-Hashcash: 0:030214:ietf-imaa@imc.org:60732d59cfd3dbea From: Simon Josefsson Date: Fri, 14 Feb 2003 15:17:28 +0100 In-Reply-To: <8fpJ5$qocDD@3247.org> (list-ietf-i18n-imaa@faerber.muc.de's message of "14 Feb 2003 14:33:00 +0100") Message-ID: User-Agent: Gnus/5.090016 (Oort Gnus v0.16) Emacs/21.3.50 References: <8fpJ5$qocDD@3247.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Status: No, hits=-2.8 required=5.0 tests=IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,SPAM_PHRASE_00_01, USER_AGENT,USER_AGENT_GNUS_UA version=2.44 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: list-ietf-i18n-imaa@faerber.muc.de (Claus Färber) writes: > Roy Badami schrieb/wrote: >> When dequoting/requoting localparts, should we consider recognizing >> fullwidth double quotes and fullwidth backslash (and any other >> double-quote-like and backlash-like characters)? > > Just do a NFKC normalisation at the very beginning and then additinally > map U+3002 to U+002E. This will handle all of these special cases. Doing normalization before mapping goes against stringprep and results in different behaviour (see the "self reverting" test vectors on ). I'm not saying your idea is a bad one, I think it is another indication that IMAA cannot be a simple stringprep profile. From owner-ietf-imaa Fri Feb 14 06:22:25 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EEMPM28437 for ietf-imaa-bks; Fri, 14 Feb 2003 06:22:25 -0800 (PST) Received: from mail.uni-bielefeld.de (IDENT:72@mail2.uni-bielefeld.de [129.70.4.90]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EEMNd28433 for ; Fri, 14 Feb 2003 06:22:23 -0800 (PST) Received: from dirichlet.Physik.Uni-Bielefeld.DE (dirichlet.Physik.Uni-Bielefeld.DE [129.70.125.234]) by mail.uni-bielefeld.de (Sun Internet Mail Server sims.4.0.2000.10.12.16.25.p8) with ESMTP id <0HAA002QUYL83O@mail.uni-bielefeld.de> for ietf-imaa@imc.org; Fri, 14 Feb 2003 15:22:20 +0100 (MET) Date: Fri, 14 Feb 2003 15:09:34 +0100 From: Marc Mutz Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA In-reply-to: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> To: ietf-imaa@imc.org Message-id: <200302141509.45488@sendmail.mutz.com> Organization: KDE MIME-version: 1.0 Content-type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Boundary-02=_piPT+XLS9Ul2dz1"; charset="us-ascii" Content-transfer-encoding: 7bit User-Agent: KMail/1.5.9 X-PGP-Key: 0xBDBFE838 References: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --Boundary-02=_piPT+XLS9Ul2dz1 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Content-Description: signed data Content-Disposition: inline On Friday 14 February 2003 12:37, Roy Badami wrote: > Should IMAA make any statement as to whether UseSTD3ASCIIRules MUST > or SHOULD be set when processessing any IDN on the RHS of an IMA, or > should this decision be left up to the implementor? The RHS is an IDN-unaware domain name slot. As such, it is accounted for=20 by IDNA, not IMAA. Marc =2D-=20 It takes 5 minutes to create [a OpenPGP key]. Of course it takes a bit more time to get it signed... -- David Faure --Boundary-02=_piPT+XLS9Ul2dz1 Content-Type: application/pgp-signature Content-Description: signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQA+TPip3oWD+L2/6DgRAvYfAKCSAi65EtZZcRIeJmjHeNCu1XgDvwCgp7g3 sFF0tGowIEbsNWZPSYyAUu0= =Ojx1 -----END PGP SIGNATURE----- --Boundary-02=_piPT+XLS9Ul2dz1-- From owner-ietf-imaa Fri Feb 14 06:40:40 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EEeeR01160 for ietf-imaa-bks; Fri, 14 Feb 2003 06:40:40 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EEecd01150 for ; Fri, 14 Feb 2003 06:40:38 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EEeuKA021987 for ; Fri, 14 Feb 2003 14:40:56 GMT To: jas@extundo.com CC: ietf-imaa@imc.org, roy@gnomon.org.uk In-reply-to: (message from Simon Josefsson on Fri, 14 Feb 2003 15:09:42 +0100) Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA References: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> Date: Fri, 14 Feb 2003 14:40:55 +0000 Message-ID: <1045233655.21973.TMDA@moriarty.gnomon.org.uk> From: Roy Badami X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Why should IMAA discuss it at all? IDNA defines how domain names are internationalized, and that takes care of RHS. Because, as IDNA says, the rules as to where the STD3 hostname rules should be applied are imprecise and the subject of debate, and the implementor needs to interpret the RFCs to decide whether they apply in any given circumstance. If the RFCs are clear on the matter, then it's surely helpful to the implementor to document in IMAA the required value of the UseSTD3ASCIIRules flag for processing IMAs. (And I'd appreciate a reference to the RFC that makes it clear whether mail domains are subject to hostname rules, if there is one.) If there's any ambiguity or disagreement as to whether the hostname rules apply, then perhaps IMAA should take the opportunity of clarifying the situation. -roy From owner-ietf-imaa Fri Feb 14 07:09:25 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EF9PH03763 for ietf-imaa-bks; Fri, 14 Feb 2003 07:09:25 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EF9Nd03759 for ; Fri, 14 Feb 2003 07:09:24 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EF9iKA022090 for ; Fri, 14 Feb 2003 15:09:44 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Re: Another issue: quoting From: Roy Badami Date: Fri, 14 Feb 2003 15:09:44 +0000 Message-ID: <1045235384.22089.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: To be quirks-compatible with software that does make a distinction between different forms of the same local-part, I'd recommend the following: That doesn't necessarily sound like a bad idea, but my original reason for raising this issue was based on a misunderstanding of RFC822. So this raises the question, do we believe that there are implementations out there that have quirks we need to work around. Do we have any idea what the quirks are? -roy From owner-ietf-imaa Fri Feb 14 07:36:56 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EFauc04331 for ietf-imaa-bks; Fri, 14 Feb 2003 07:36:56 -0800 (PST) Received: from yxa.extundo.com (178.230.13.217.in-addr.dgcsystems.net [217.13.230.178]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EFasd04327 for ; Fri, 14 Feb 2003 07:36:54 -0800 (PST) Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) (authenticated bits=0) by yxa.extundo.com (8.12.7/8.12.7) with ESMTP id h1EFaqXf005046 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Fri, 14 Feb 2003 16:36:53 +0100 To: Roy Badami Cc: ietf-imaa@imc.org Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA X-Payment: hashcash 1.1 0:030214:roy@gnomon.org.uk:e78805723debcece X-Hashcash: 0:030214:roy@gnomon.org.uk:e78805723debcece X-Payment: hashcash 1.1 0:030214:ietf-imaa@imc.org:32b2312a5b929d15 X-Hashcash: 0:030214:ietf-imaa@imc.org:32b2312a5b929d15 From: Simon Josefsson Date: Fri, 14 Feb 2003 16:36:52 +0100 In-Reply-To: <1045233655.21973.TMDA@moriarty.gnomon.org.uk> (Roy Badami's message of "Fri, 14 Feb 2003 14:40:55 +0000") Message-ID: User-Agent: Gnus/5.090016 (Oort Gnus v0.16) Emacs/21.3.50 References: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> <1045233655.21973.TMDA@moriarty.gnomon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Status: No, hits=-2.8 required=5.0 tests=IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,SPAM_PHRASE_00_01, USER_AGENT,USER_AGENT_GNUS_UA version=2.44 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami writes: > Why should IMAA discuss it at all? IDNA defines how domain names are > internationalized, and that takes care of RHS. > > Because, as IDNA says, the rules as to where the STD3 hostname rules > should be applied are imprecise and the subject of debate, and the > implementor needs to interpret the RFCs to decide whether they apply > in any given circumstance. > > If the RFCs are clear on the matter, then it's surely helpful to the > implementor to document in IMAA the required value of the > UseSTD3ASCIIRules flag for processing IMAs. (And I'd appreciate a > reference to the RFC that makes it clear whether mail domains are > subject to hostname rules, if there is one.) > > If there's any ambiguity or disagreement as to whether the hostname > rules apply, then perhaps IMAA should take the opportunity of > clarifying the situation. Informative text doesn't hurt, I guess, but IMAA shouldn't make normative statements about this IMHO. All application must already know the answer to this problem, as the same problem existed before IDNA. If the RFCs are unclear, that should be fixed independently of IMAA. From owner-ietf-imaa Fri Feb 14 07:55:01 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EFt1O04839 for ietf-imaa-bks; Fri, 14 Feb 2003 07:55:01 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EFsxd04835 for ; Fri, 14 Feb 2003 07:55:00 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EFtKKA022313 for ; Fri, 14 Feb 2003 15:55:20 GMT To: jas@extundo.com CC: ietf-imaa@imc.org, roy@gnomon.org.uk In-reply-to: (message from Simon Josefsson on Fri, 14 Feb 2003 16:36:52 +0100) Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA References: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> Date: Fri, 14 Feb 2003 15:55:19 +0000 Message-ID: <1045238119.22308.TMDA@moriarty.gnomon.org.uk> From: Roy Badami X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Informative text doesn't hurt, I guess, but IMAA shouldn't make normative statements about this IMHO. All application must already know the answer to this problem, as the same problem existed before IDNA. If the RFCs are unclear, that should be fixed independently of IMAA. This is what IDNA says, but I'm not sure it's true. I doubt most mail software knows the answer to this problem; I suspect that most MUAs simply pass the address to the MTA and see what happens, and that most MTAs simply pass the domain to the resolver and see what happens. In the absence of an RFC-mandated requirement to validate the domain, it could be argued that this is the correct thing to do in terms of the robustness principle, even if the domain name contains characters that are technically illegal. However, with IMAs, the MUA has to explicitly invoke IDNA, and it hence has to choose a value for the UseSTD3ASCIIRules flag. It isn't absolutely clear to me what the correct value of this flag is (though it may be clear to others), and it's highly desirable that IMAA implementations behave consistently. I don't see why normative text must be ruled out. -roy From owner-ietf-imaa Fri Feb 14 08:51:23 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EGpNt07460 for ietf-imaa-bks; Fri, 14 Feb 2003 08:51:23 -0800 (PST) Received: from server1.matic.com (server.iicinternet.com [66.159.16.71] (may be forged)) by above.proper.com (8.11.6/8.11.3) with SMTP id h1EGpMd07453 for ; Fri, 14 Feb 2003 08:51:22 -0800 (PST) Received: (qmail 1358 invoked from network); 14 Feb 2003 16:50:53 -0000 Received: from adsl-65-43-34-157.dsl.lgtpmi.ameritech.net (HELO ?192.168.0.100?) (65.43.34.157) by server.iicinternet.com with SMTP; 14 Feb 2003 16:50:53 -0000 Mime-Version: 1.0 X-Sender: tedd@sperling.com (Unverified) Message-Id: In-Reply-To: <3E4B5CAD.9090304@bic.nus.edu.sg> References: <138AA78F80DCE84B8EE424399FFBF9C904FAA1@exchange.ad.skymv.com> <3E4B5CAD.9090304@bic.nus.edu.sg> Date: Fri, 14 Feb 2003 11:50:39 -0500 To: ietf-imaa@imc.org From: tedd Subject: Re: The typing issue Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: >Sorry to take so long to put across a small point, >but I am not a true native English user. > >-- >tin wee tin wee: It's not a small point -- it's THE point. The entire issue has been to globalize Internet use, not limit it. tedd -- http://sperling.com/ From owner-ietf-imaa Fri Feb 14 09:00:11 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EH0BS07836 for ietf-imaa-bks; Fri, 14 Feb 2003 09:00:11 -0800 (PST) Received: from yxa.extundo.com (178.230.13.217.in-addr.dgcsystems.net [217.13.230.178]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EH09d07832 for ; Fri, 14 Feb 2003 09:00:09 -0800 (PST) Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) (authenticated bits=0) by yxa.extundo.com (8.12.7/8.12.7) with ESMTP id h1EH07Xf006936 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Fri, 14 Feb 2003 18:00:08 +0100 To: Roy Badami Cc: ietf-imaa@imc.org Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA X-Payment: hashcash 1.1 0:030214:roy@gnomon.org.uk:69c15676ad6cb0f7 X-Hashcash: 0:030214:roy@gnomon.org.uk:69c15676ad6cb0f7 X-Payment: hashcash 1.1 0:030214:ietf-imaa@imc.org:d228d3dbac191cd6 X-Hashcash: 0:030214:ietf-imaa@imc.org:d228d3dbac191cd6 From: Simon Josefsson Date: Fri, 14 Feb 2003 18:00:07 +0100 In-Reply-To: <1045238119.22308.TMDA@moriarty.gnomon.org.uk> (Roy Badami's message of "Fri, 14 Feb 2003 15:55:19 +0000") Message-ID: User-Agent: Gnus/5.090016 (Oort Gnus v0.16) Emacs/21.3.50 References: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> <1045238119.22308.TMDA@moriarty.gnomon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Status: No, hits=-2.8 required=5.0 tests=IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,SPAM_PHRASE_00_01, USER_AGENT,USER_AGENT_GNUS_UA version=2.44 Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami writes: > Informative text doesn't hurt, I guess, but IMAA shouldn't make > normative statements about this IMHO. > > All application must already know the answer to this problem, as the > same problem existed before IDNA. If the RFCs are unclear, that > should be fixed independently of IMAA. > > This is what IDNA says, but I'm not sure it's true. I doubt most mail > software knows the answer to this problem; I suspect that most MUAs > simply pass the address to the MTA and see what happens, and that most > MTAs simply pass the domain to the resolver and see what happens. > > In the absence of an RFC-mandated requirement to validate the domain, > it could be argued that this is the correct thing to do in terms of the > robustness principle, even if the domain name contains characters that > are technically illegal. > > However, with IMAs, the MUA has to explicitly invoke IDNA, and it > hence has to choose a value for the UseSTD3ASCIIRules flag. It isn't > absolutely clear to me what the correct value of this flag is (though > it may be clear to others), and it's highly desirable that IMAA > implementations behave consistently. I don't see why normative text > must be ruled out. IDNA suggests that applications that simply passed things on before should not set the flag, and applications that enforces hostname restrictions today should set the flag. Application have made a decision, conscious or not. I'm not sure I see any need to enforce anything with regard to this more than already is, and in particular not how IMAA would be the right place for it. ,---- | 3) For each label, decide whether or not to enforce the restrictions on | ASCII characters in host names [STD3]. (Applications already faced this | choice before the introduction of IDNA, and can continue to make the | decision the same way they always have; IDNA makes no new | recommendations regarding this choice.) If the restrictions are to be | enforced, set the flag called "UseSTD3ASCIIRules" for that label. `---- From owner-ietf-imaa Fri Feb 14 09:58:57 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EHwvJ13057 for ietf-imaa-bks; Fri, 14 Feb 2003 09:58:57 -0800 (PST) Received: from slarti.muc.de (slarti.muc.de [193.149.48.10]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1EHwud13053 for ; Fri, 14 Feb 2003 09:58:56 -0800 (PST) Received: (qmail 22443 invoked by uid 66); 14 Feb 2003 17:58:54 -0000 Received: from faerber.muc.de by slarti.muc.de with BSMTP (rsmtp-qm-ot 0.4) for ietf-imaa@imc.org; 14 Feb 2003 17:58:54 -0000 Received: by faerber.muc.de (OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-14-1342d); 14 Feb 2003 18:58:49 +0100 Date: 14 Feb 2003 18:43:00 +0100 From: list-ietf-i18n-imaa@faerber.muc.de (=?ISO-8859-1?Q?Claus_F=E4rber?=) To: ietf-imaa@imc.org Message-ID: <8fpKYQTZcDD@3247.org> In-Reply-To: Subject: Re: Question: Fullwidth double-quote and fullwidth backslash User-Agent: OpenXP/32 v3.9.4 (Win32) alpha @ 2003-02-14-1342d MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Simon Josefsson schrieb/wrote: > list-ietf-i18n-imaa@faerber.muc.de (Claus Färber) writes: >> Just do a NFKC normalisation at the very beginning and then additinally >> map U+3002 to U+002E. This will handle all of these special cases. > Doing normalization before mapping goes against stringprep and results > in different behaviour (see the "self reverting" test vectors on > ). It does not if you do the normalisation twice (at the very beginning and after mapping). For IMAA, it suffices to specify that implementations MUST accept all characters as delimiters that decompose to one of our delimiters during NFKC-with-U+3002-to-U+002E normalisation and that the delimiters MUST be normalised. The easiest way to implement this is an additional normalisation at the very beginning. IDNA can get away wihtout such a normalisation because they have a single delimiter (U+002E) in their output. The IDNA processing maps all dot variants (including U+3002 and the width variants) to whatever delimiter is used (usually a dot U+002E or, in DNS packets, no delimiter at all). Claus -- http://www.faerber.muc.de/ From owner-ietf-imaa Fri Feb 14 11:13:10 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EJDAf16564 for ietf-imaa-bks; Fri, 14 Feb 2003 11:13:10 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EJD9d16559 for ; Fri, 14 Feb 2003 11:13:09 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id OAA20781; Fri, 14 Feb 2003 14:13:08 -0500 Message-Id: <4.2.0.58.J.20030214105633.05c158e0@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Fri, 14 Feb 2003 10:59:27 -0500 To: Roy Badami , ietf-imaa@imc.org From: Martin Duerst Subject: Re: Question: Fullwidth double-quote and fullwidth backslash In-Reply-To: <1045224428.21358.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: My (limited) understanding is that quotes and backslashes are not printed on business cards, and not entered by the user. It therefore seems completely unnecessary to consider full-width variants. While the average user might not get the '@' right, we should be able to rely on programmers getting the quotes and backslashes right. Regards, Martin. At 12:07 03/02/14 +0000, Roy Badami wrote: >When dequoting/requoting localparts, should we consider recognizing >fullwidth double quotes and fullwidth backslash (and any other >double-quote-like and backlash-like characters)? > >It seems to me that the arguments for this are similar to those for >fullwidth dot and fullwidth at, and once we decide to recognize >metacharacters in fullwidth form, we should apply this consistently to >*all* metacharacters. > > -roy From owner-ietf-imaa Fri Feb 14 11:13:08 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EJD8b16552 for ietf-imaa-bks; Fri, 14 Feb 2003 11:13:08 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EJD7d16546 for ; Fri, 14 Feb 2003 11:13:07 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id OAA20778; Fri, 14 Feb 2003 14:13:08 -0500 Message-Id: <4.2.0.58.J.20030214105432.05980360@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Fri, 14 Feb 2003 10:55:37 -0500 To: IETF IMAA list , ietf-imaa@imc.org From: Martin Duerst Subject: Re: Question: full-width at Cc: Roy Badami In-Reply-To: <20030214050748.GF25048@nicemice.net> References: <1045190936.18837.TMDA@moriarty.gnomon.org.uk> <1045190936.18837.TMDA@moriarty.gnomon.org.uk> <1045190936.18837.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Hello Adam, I agree very much with the general direction of staying away from user interface issues. The less of these, the better. Regards, Martin. At 05:07 03/02/14 +0000, Adam M. Costello wrote: >These issues were considered for IDNA. There are many characters that >decompose to dot (like U+FE52 small full stop, U+2024 one dot leader) or >decompose to strings containing dots (like U+2488 digit one full stop, >U+33C2 square AM). Therefore, if a whole domain name is normalized >before being scanned for dots, it might result in a different number of >labels than if it had not been normalized. We discussed whether IDNA >should require this pre-normalization. Ultimately we decided that that >was getting too far into user interface issues. IDNA instead focuses >almost entirely on individual labels, not whole domain names. The >one exception is to require that three particular dot-like non-ASCII >characters be recognized as dots, just the three that we had reason to >believe would be likely to be input by users trying to type dots. > >The IMAA draft follows this example. It says as little as possible >about entire mail addresses, and focuses on the local part. The >one exception is to require one particular at-like character to be >recognized as an at-sign, the one that we have reason to believe is >likely to be input by users trying to type at-signs. > >As for initial combining characters, the same question arose with IDNA. >If a label begins with a combining character, will it combine with the >preceeding dot? This was considered a user-interface issue that IDNA >should not address. > >AMC From owner-ietf-imaa Fri Feb 14 11:13:09 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EJD9P16560 for ietf-imaa-bks; Fri, 14 Feb 2003 11:13:09 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EJD8d16551 for ; Fri, 14 Feb 2003 11:13:08 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id OAA20788; Fri, 14 Feb 2003 14:13:09 -0500 Message-Id: <4.2.0.58.J.20030214110907.05bf4398@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Fri, 14 Feb 2003 11:36:49 -0500 To: IETF IMAA list , Roy Badami From: Martin Duerst Subject: Re: A couple of comments on the open issues... In-Reply-To: <20030213235319.GB25048@nicemice.net> References: <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: At 23:53 03/02/13 +0000, Adam M. Costello wrote: >Roy Badami wrote: > > I'm particular keen that the use of a tag or suffix with a local > > username in not broken by IMAA. Many MTAs provide the functionality > > that all mail addressed to will be delivered > > to , either by default, or as a configuration option. > > > > This effectively allows a user of such an MTA (that has been suitably > > configureed) to have multiple e-mail addresses without requiring any > > action on the part of the mail administrator, and allows the user to > > run scripts that process their mail according to the suffix. > >So if IMAA operates on the entire local part, this multiple-address >feature will be unavailable to users with internationalized local parts. >This is a good argument in favor of having IMAA operate independently on >subparts of the local part. First a question: Is subadressing just something that is done by a few email systems locally, or is it something specified in some of the email standards? The use of different separators in different systems seem to suggest the later. If that's the case, to what extent do we really have to consider it here? Also, using subaddressing seems to be quite popular for high-end users. But is it actually very much used by the bulk of users (the proverbial hotmail/yahoo/... crowd)? My guess would be that it's not. If that's the case, then this may give us some slack, because as far as I understand, the main push for internationalized addresses is from average users, not necessarily high-end users. It's also easy to imagine that users have a single internationalized address (for personal mail) and a bunch of structured addresses (for list subscriptions,...). A third thing I have just thought about is that the design of punycode actually has some very nice properties that allow separation of the subnet address in most cases even if the whole LHS is encoded in one go. The first thing here is that the separators, as long as they are ASCII characters, are still visible in the encoded version. Secondly, very simple pattern search allows to find all the addresses belonging to the same primary address, with one exception: It's difficult to check whether there are accidental non-ASCII characters smuggled in before the separator. As an example of the last case, assume '+' is the separator, 'abc' is the primary address, 'AABBCC' is it's punycode encoding, and we find an address looking something like xn--+fghAABBCCDDEE, then we don't know whether the original address was abc+defgh, or whether it was abcd+efgh or abcde+fgh, i.e. we don't know whether what's encoded in DDEE comes before or after the +. But we know that 'abc' comes before the +, because otherwise it would be encoded differently. So a seach for something like m/^xn--+[\x21-\x7e]*$user.*/ (where $user is the punycode encoding of the username in initial position) will pretty much identify all the mail for that user. Regards, Martin. From owner-ietf-imaa Fri Feb 14 12:24:23 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EKONH19778 for ietf-imaa-bks; Fri, 14 Feb 2003 12:24:23 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EKOLd19774 for ; Fri, 14 Feb 2003 12:24:21 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EKOZKA023410 for ; Fri, 14 Feb 2003 20:24:35 GMT To: duerst@w3.org CC: ietf-imaa@imc.org, roy@gnomon.org.uk In-reply-to: <4.2.0.58.J.20030214105633.05c158e0@localhost> (message from Martin Duerst on Fri, 14 Feb 2003 10:59:27 -0500) Subject: Re: Question: Fullwidth double-quote and fullwidth backslash References: <4.2.0.58.J.20030214105633.05c158e0@localhost> Date: Fri, 14 Feb 2003 20:24:34 +0000 Message-ID: <1045254274.23405.TMDA@moriarty.gnomon.org.uk> From: Roy Badami X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: On the contrary, quotes can appear on business cards. Consider the following (invented) address, obtained by mapping an X.400 address using RFC1148 or successors: "/PN=Roy.Badami/OU=Systems/O=Microsoft Inc/C=US/ADMD=ATT/"@x-400-relay.att.com I really have seen addresses like this (though not recently, I'll admit). If the LHS contains unusual characters, quoting had better appear on the business card. -roy From owner-ietf-imaa Fri Feb 14 12:31:06 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EKV6W20348 for ietf-imaa-bks; Fri, 14 Feb 2003 12:31:06 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EKV4d20340 for ; Fri, 14 Feb 2003 12:31:04 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EKVRKA023447 for ; Fri, 14 Feb 2003 20:31:28 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Re: Question: Fullwidth double-quote and fullwidth backslash From: Roy Badami Date: Fri, 14 Feb 2003 20:31:27 +0000 Message-ID: <1045254687.23446.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: It does not if you do the normalisation twice (at the very beginning and after mapping). For IMAA, it suffices to specify that implementations MUST accept all characters as delimiters that decompose to one of our delimiters during NFKC-with-U+3002-to-U+002E normalisation and that the delimiters MUST be normalised. The easiest way to implement this is an additional normalisation at the very beginning. Are you saying we can do a normalization of the entire e-mail address without violating IDNA (which specifies that the domain be split on dot-like characters before normalization). Because we have to parse the quoting in order to identify the local-part (the LHS may contain quoted at-signs). -roy From owner-ietf-imaa Fri Feb 14 12:34:20 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EKYKx20880 for ietf-imaa-bks; Fri, 14 Feb 2003 12:34:20 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EKYId20872 for ; Fri, 14 Feb 2003 12:34:18 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EKYfKA023468 for ; Fri, 14 Feb 2003 20:34:41 GMT To: duerst@w3.org CC: ietf-imaa@imc.org In-reply-to: <4.2.0.58.J.20030214110907.05bf4398@localhost> (message from Martin Duerst on Fri, 14 Feb 2003 11:36:49 -0500) Subject: Re: A couple of comments on the open issues... References: <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <4.2.0.58.J.20030214110907.05bf4398@localhost> Date: Fri, 14 Feb 2003 20:34:40 +0000 Message-ID: <1045254880.23456.TMDA@moriarty.gnomon.org.uk> From: Roy Badami X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: First a question: Is subadressing just something that is done by a few email systems locally, or is it something specified in some of the email standards? It's not specified in any standard that I'm aware of, but it is functionality that is provided in many Unix MTAs. Also, using subaddressing seems to be quite popular for high-end users. But is it actually very much used by the bulk of users (the proverbial hotmail/yahoo/... crowd)? My guess would be that it's not. Probably true. If that's the case, then this may give us some slack, because as far as I understand, the main push for internationalized addresses is from average users, not necessarily high-end users. It's also easy to imagine that users have a single internationalized address (for personal mail) and a bunch of structured addresses (for list subscriptions,...). I guess. But it would be nice if high-end users could use IMAs with subaddresses if they wish. A third thing I have just thought about is that the design of punycode actually has some very nice properties that allow separation of the subnet address in most cases even if the whole LHS is encoded in one go. The problem here is that you don't want to change the MTA. The MTA uses a particular algorithm for spliting the address into mailbox and subaddress, and it would be nice if IMAA didn't break it. -roy From owner-ietf-imaa Fri Feb 14 12:51:42 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EKpgN21827 for ietf-imaa-bks; Fri, 14 Feb 2003 12:51:42 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EKped21823 for ; Fri, 14 Feb 2003 12:51:40 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EKq3KA023558 for ; Fri, 14 Feb 2003 20:52:03 GMT To: ietf-imaa@imc.org cc: roy@gnomon.org.uk Subject: Re: Case sensitivity on the LHS From: Roy Badami Date: Fri, 14 Feb 2003 20:52:03 +0000 Message-ID: <1045255923.23555.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: The IDNA model appears to be a better tradeoff: mandate that non-ASCII local parts are always case-insensitive, and leave mixed-case annotations as an *optional* technique for preserving case. So does IDNA permit mixed-case annotation? I was under the impression that at one point the draft forbade the use of mixed-case annotation with IDNA, but I can't find that prohibition in the current documents. How exactly does mixed-case annotation work in IDNA? You have to somehow pass case information through the nameprep process; is the precise algorithm actually spelled out anywhere? -roy From owner-ietf-imaa Fri Feb 14 13:06:51 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EL6pB22060 for ietf-imaa-bks; Fri, 14 Feb 2003 13:06:51 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EL6md22056 for ; Fri, 14 Feb 2003 13:06:48 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1EL7AKA023630 for ; Fri, 14 Feb 2003 21:07:10 GMT To: jas@extundo.com CC: ietf-imaa@imc.org, roy@gnomon.org.uk In-reply-to: (message from Simon Josefsson on Fri, 14 Feb 2003 18:00:07 +0100) Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA References: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> Date: Fri, 14 Feb 2003 21:07:09 +0000 Message-ID: <1045256829.23623.TMDA@moriarty.gnomon.org.uk> From: Roy Badami X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: IDNA suggests that applications that simply passed things on before should not set the flag, and applications that enforces hostname restrictions today should set the flag. Application have made a decision, conscious or not. I'm not sure I see any need to enforce anything with regard to this more than already is, and in particular not how IMAA would be the right place for it. Ok, maybe you're right. However an informational note would not go amiss, even if it's only on the lines of the note in IDNA that basically just says 'this point is contentious'. Further minor wrinkle: if the application chooses not to enforce hostname rules (ie not to set UseSTD3ASCIIRules), it still needs to enforce those retrictions on domain names that occur as a result of the 822/2822 syntax (ie a label can't contain specials). Does IMAA need to say anything about the application of the 822/2822 rules (in the same way that IDNA talks about the application of the STD3 rules)? -roy From owner-ietf-imaa Fri Feb 14 13:59:09 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1ELx9R23331 for ietf-imaa-bks; Fri, 14 Feb 2003 13:59:09 -0800 (PST) Received: from tux.w3.org (IDENT:root@tux.w3.org [18.29.0.27]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1ELx8d23327 for ; Fri, 14 Feb 2003 13:59:08 -0800 (PST) Received: from enoshima (IDENT:root@tux.w3.org [18.29.0.27]) by tux.w3.org (8.9.3/8.9.3) with ESMTP id QAA03780; Fri, 14 Feb 2003 16:59:04 -0500 Message-Id: <4.2.0.58.J.20030214153013.059a8818@localhost> X-Sender: duerst@localhost X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J Date: Fri, 14 Feb 2003 15:38:18 -0500 To: Roy Badami From: Martin Duerst Subject: Re: Question: Fullwidth double-quote and fullwidth backslash Cc: ietf-imaa@imc.org In-Reply-To: <1045254274.23405.TMDA@moriarty.gnomon.org.uk> References: <4.2.0.58.J.20030214105633.05c158e0@localhost> <4.2.0.58.J.20030214105633.05c158e0@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Hello Roy, At 20:24 03/02/14 +0000, Roy Badami wrote: >On the contrary, quotes can appear on business cards. Ok, thanks. So they actually can, and do, in odd cases. Paper is patient. (German saying) But are we really required, or do we see it as our goal, to help people avoid some potential typing mistakes in addresses that are, by their length and complexity, not at all user-friendly in the first place? My position is that we don't have any reason to go there. Regards, Martin. >Consider the following (invented) address, obtained by mapping an >X.400 address using RFC1148 or successors: > >"/PN=Roy.Badami/OU=Systems/O=Microsoft Inc/C=US/ADMD=ATT/"@x-400-relay.att.com > >I really have seen addresses like this (though not recently, I'll admit). > >If the LHS contains unusual characters, quoting had better appear on >the business card. > > -roy From owner-ietf-imaa Fri Feb 14 15:06:19 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1EN6Jt25355 for ietf-imaa-bks; Fri, 14 Feb 2003 15:06:19 -0800 (PST) Received: from relay-3m.club-internet.fr (relay-3m.club-internet.fr [194.158.104.42]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1EN6Hd25351 for ; Fri, 14 Feb 2003 15:06:17 -0800 (PST) Received: from mine.club-internet.fr (f16m-10-114.d1.club-internet.fr [212.195.121.114]) by relay-3m.club-internet.fr (Postfix) with ESMTP id AB982E113; Sat, 15 Feb 2003 00:07:01 +0100 (CET) Message-Id: <5.2.0.9.0.20030214233425.02baf6e0@mail.club-internet.fr> X-Sender: jefsey@mail.club-internet.fr X-Mailer: QUALCOMM Windows Eudora Version 5.2.0.9 Date: Sat, 15 Feb 2003 00:02:13 +0100 To: Martin Duerst , Roy Badami From: "J-F C. (Jefsey) Morfin" Subject: Re: Question: Fullwidth double-quote and fullwidth backslash Cc: ietf-imaa@imc.org In-Reply-To: <4.2.0.58.J.20030214153013.059a8818@localhost> References: <1045254274.23405.TMDA@moriarty.gnomon.org.uk> <4.2.0.58.J.20030214105633.05c158e0@localhost> <4.2.0.58.J.20030214105633.05c158e0@localhost> Mime-Version: 1.0 Content-Type: multipart/mixed; x-avg-checked=avg-ok-2353733D; boundary="=======3B9A1B11=======" Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --=======3B9A1B11======= Content-Type: text/plain; x-avg-checked=avg-ok-2353733D; charset=us-ascii; format=flowed Content-Transfer-Encoding: 8bit At 21:38 14/02/03, Martin Duerst wrote: >My position is that we don't have any reason to go there. What 95% of the users could accept today will not tell you much about what 10% will demand once IMAA has changed the conception of 80% of the worldwide users, and service providers and designers, about the mail address, ie a key element of a service representing 80% of the internet traffic. IMHO the question is not "what should we do?", but "what cannot we really do?". I have used the WSIS lists and asked around. People cannot commit on something they never saw. But the interest, and the subsequent demands are here. I suggest you carry the same test. Also, remember that people mostly use Windows, and that Windows uses file names with space and write file names with upper cases on the diplays etc.. and some other funny things people see every day and they understand as an improvement over the current proposition (or a liberation from limitations they do not understand: "why would it be so complex? it is all over my IE screen today"). --=======3B9A1B11======= Content-Type: text/plain; charset=us-ascii; x-avg=cert; x-avg-checked=avg-ok-2353733D Content-Disposition: inline --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.454 / Virus Database: 253 - Release Date: 10/02/03 --=======3B9A1B11=======-- From owner-ietf-imaa Fri Feb 14 15:51:37 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1ENpbr26241 for ietf-imaa-bks; Fri, 14 Feb 2003 15:51:37 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1ENpad26231 for ; Fri, 14 Feb 2003 15:51:36 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jpcJ-0001Xp-00; Fri, 14 Feb 2003 15:51:39 -0800 Date: Fri, 14 Feb 2003 23:51:39 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Cc: Roy Badami Subject: Re: Case sensitivity on the LHS Message-ID: <20030214235139.GC4500@nicemice.net> Reply-To: IETF IMAA list , Roy Badami References: <1045255923.23555.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1045255923.23555.TMDA@moriarty.gnomon.org.uk> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami wrote: > So does IDNA permit mixed-case annotation? I was under the impression > that at one point the draft forbade the use of mixed-case annotation > with IDNA, but I can't find that prohibition in the current documents. IDNA makes no mention of mixed-case annotation or case preservation. > How exactly does mixed-case annotation work in IDNA? Officially, it doesn't. It was too contentious to be mentioned. The prevailing opinion was that IDNA is complex enough already, and shouldn't saddle implementors with nonessential complications. However, IDNA is architected so that mixed-case annotations could be added in the future in a backward-compatible way. The ToASCII operation already has almost enough flexibility for this, because the case of the letters used to encode the non-ASCII characters is unconstrained. The ASCII letters, however, are always lowercased by the Nameprep step, and Punycode simply copies them, so a strict implementation of ToASCII cannot preserve the case of ASCII letters for inputs containing some non-ASCII characters. And a strict implementation of ToUnicode has no flexibility at all; the exact Unicode output is completely determined regardless of any mixed-case in the ACE input. But ToASCII and ToUnicode don't really need to be that strict to ensure interoperability. As long as they output something equivalent to what the official versions output, that's good enough. A future update of IDNA could therefore relax the specification enough to allow for mixed-case annotations. Even if such an update never comes, gutsy implementors could go ahead and do it, and still interoperate with the strictly conformant implementations and with each other. That last sentence is probably heresy. :) > You have to somehow pass case information through the nameprep > process; is the precise algorithm actually spelled out anywhere? Nope. Since it's not officially supported, working out all the details wasn't a priority, and hasn't yet been done. AMC P.S. Roy, have you subscribed to the list yet? You appear to be participating as actively as anyone. From owner-ietf-imaa Fri Feb 14 16:15:37 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F0FbU26803 for ietf-imaa-bks; Fri, 14 Feb 2003 16:15:37 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F0FZd26799 for ; Fri, 14 Feb 2003 16:15:35 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jpzX-0001dF-00 for ; Fri, 14 Feb 2003 16:15:39 -0800 Date: Sat, 15 Feb 2003 00:15:39 +0000 From: "Adam M. Costello" To: IETF IMAA list Subject: Re: A couple of comments on the open issues... Message-ID: <20030215001538.GE4500@nicemice.net> Reply-To: IETF IMAA list References: <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <4.2.0.58.J.20030214110907.05bf4398@localhost> <1045254880.23456.TMDA@moriarty.gnomon.org.uk> <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <4.2.0.58.J.20030214110907.05bf4398@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1045254880.23456.TMDA@moriarty.gnomon.org.uk> <4.2.0.58.J.20030214110907.05bf4398@localhost> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Martin Duerst wrote: > Is subadressing just something that is done by a few email systems > locally, or is it something specified in some of the email standards? As far as I know, there are no standards for any structured local parts. It's all unofficial conventions. > to what extent do we really have to consider it here? I'd say we're under no obligation to play nicely with these unofficial practices, but our goal is to create utility, not frustration, so we shouldn't immediately dismiss the concern. > Also, using subaddressing seems to be quite popular for high-end > users. But is it actually very much used by the bulk of users (the > proverbial hotmail/yahoo/... crowd)? Currently, no. As spam becomes ever more of a problem, more users might start using subaddressing as a means of combating it (like I do). > It's also easy to imagine that users have a single internationalized > address (for personal mail) and a bunch of structured addresses (for > list subscriptions,...). That's a good point. But then there are freaks like me who use subaddressing for all mail, personal or not, as a means of eliminating spam. Every address is expendable and can be disabled the first time it receives spam; there is no single stable mail address for me (but there is a stable template, and a stable URL). > A third thing I have just thought about is that the design of punycode > actually has some very nice properties that allow separation of the > subnet address in most cases even if the whole LHS is encoded in one > go. That's kind of cool, but as Roy says, it still doesn't let people use IMAs containing delimiters in the local part. If the mail server for the domain is configured to support user+tag syntax, but the ACE form of USER+tag looks like xn--+tag-blahblah, then it won't work at all, because the mail server will try to deliver to xn--. So users would not get the full IMAA functionality until the infrastructure (the MTA) has been upgraded, which is an undesirable dependence (one that IDNA does not suffer). The counter-argument is that most users won't care about the missing functionality. AMC From owner-ietf-imaa Fri Feb 14 16:33:45 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F0XjC27112 for ietf-imaa-bks; Fri, 14 Feb 2003 16:33:45 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F0Xid27108 for ; Fri, 14 Feb 2003 16:33:44 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jqH5-0001gJ-00 for ; Fri, 14 Feb 2003 16:33:47 -0800 Date: Sat, 15 Feb 2003 00:33:47 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: Another issue: quoting Message-ID: <20030215003347.GF4500@nicemice.net> Reply-To: IETF IMAA list References: <1045183773.18276.TMDA@moriarty.gnomon.org.uk> <8fpJ5tRZcDD@3247.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8fpJ5tRZcDD@3247.org> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Claus Färber wrote: > To be quirks-compatible with software that does make a distinction > between different forms of the same local-part, I'd recommend the > following: > > . Save the original form. > . Fully dequote/remove comments/extra whitespace before toUnicode > and/or toASCII. > . If the output of toUnicode/toASCII is identical to the input, use the > original form in step #1, otherwise use the output generated by the > function. An interesting idea, but you seem to be assuming that the destination of the local part uses the same quoting mechanism as the origin of the local part. When these transformations are applied, it's typically because a local part obtained from a user interface is being transfered into a message header or SMTP command, or a local part obtained from a message header is being transfered onto a display. The quotation mechanisms for local parts in user interfaces (or app-specific config files, etc) are not standardized, and may or may not match the quotation mechanisms used in message headers and SMTP commands. AMC From owner-ietf-imaa Fri Feb 14 16:58:34 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F0wYS27982 for ietf-imaa-bks; Fri, 14 Feb 2003 16:58:34 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F0wXd27978 for ; Fri, 14 Feb 2003 16:58:33 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jqf6-0001kD-00; Fri, 14 Feb 2003 16:58:36 -0800 Date: Sat, 15 Feb 2003 00:58:36 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Cc: Roy Badami Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA Message-ID: <20030215005836.GG4500@nicemice.net> Reply-To: IETF IMAA list , Roy Badami References: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> <1045238119.22308.TMDA@moriarty.gnomon.org.uk> <1045222662.21247.TMDA@moriarty.gnomon.org.uk> <1045233655.21973.TMDA@moriarty.gnomon.org.uk> <1045222662.21247.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1045256829.23623.TMDA@moriarty.gnomon.org.uk> <1045222662.21247.TMDA@moriarty.gnomon.org.uk> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami wrote: > Should IMAA make any statement as to whether UseSTD3ASCIIRules MUST > or SHOULD be set when processessing any IDN on the RHS of an IMA, or > should this decision be left up to the implementor? Simon Josefsson wrote: > All application must already know the answer to this problem, as the > same problem existed before IDNA. If the RFCs are unclear, that > should be fixed independently of IMAA. > > I'm not sure I see any need to enforce anything with regard to this > more than already is, and in particular not how IMAA would be the > right place for it. I agree with Simon. IMAA explicitly points to IDNA for handling the domain part of the mail address, and IDNA already says as much as we dare on the subject. Here's a taste of why we don't want stick our necks out: RFC-821 forbids 3foo.org. RFC-2821 allows 3foo.org, but forbids foo+.org. RFC-822 and RFC-2822 allow foo+.org. It's a mess, and we don't want to make it our job to try to make sense of it. We know from experience with IDNA that any attempt at clarification will be too controversial to reach consensus. Roy Badami wrote: > Further minor wrinkle: if the application chooses not to enforce > hostname rules (ie not to set UseSTD3ASCIIRules), it still needs to > enforce those retrictions on domain names that occur as a result of > the 822/2822 syntax (ie a label can't contain specials). If the mail address is being put into a message header, then it ought to enforce those restrictions, yes. If the mail address is being put into an SMTP command, then it's the 821/2821 syntax that ought to be enforced. The destination might be neither of those contexts, but something else with its own formal syntax rules. > Does IMAA need to say anything about the application of the 822/2822 > rules (in the same way that IDNA talks about the application of the > STD3 rules)? It might be good to generalize "quote the local part if necessary" to something like "handle any context-dependent syntax rules", and give the same example about quotation sometimes being needed in message headers and SMTP commands, and maybe give another example about character restrictions in those contexts. AMC From owner-ietf-imaa Fri Feb 14 17:00:05 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F105O28041 for ietf-imaa-bks; Fri, 14 Feb 2003 17:00:05 -0800 (PST) Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F103d28035 for ; Fri, 14 Feb 2003 17:00:03 -0800 (PST) Received: from [209.187.148.215] (helo=p3.JCK.COM) by bs.jck.com with esmtp (Exim 4.10) id 18jqgT-000FpW-00 for ietf-imaa@imc.org; Fri, 14 Feb 2003 20:00:01 -0500 Date: Fri, 14 Feb 2003 20:00:00 -0500 From: John C Klensin To: IETF IMAA list Subject: Can we back up a bit and ask some basic questions? An alternate model Message-ID: <18245836.1045252800@p3.JCK.COM> X-Mailer: Mulberry/3.0.0 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: ( Very long -- contains both a high-level critique and the basis for an alternate proposal that preserves local-part opacity and skips ACE forms and goes directly to UTF-8 or another common Unicode encoding. ) Hi. I've been trying to follow the traffic about this proposal, and am frankly overwhelmed. I hope I'm not addressing anything that has already been discussed..., but I haven't seen such traffic. (1) A context for email internationalization and a critique of IMAA's starting assumptions... IMAA starts from the assumption that the right way to handle domain names is, with slight modifications, the right way to handle email local-parts. The debate, with few exceptions, has been about details and options given that choice. I'd like to move up a half-level and suggest that choice may be the wrong one, and that there are other options that will serve us, and Internet email generally, better. A different way to put my concern is that I wonder whether, with the IDNA-hammer in hand, email is just the nearest handy thing that can be construed as a nail. Several of the things that imposed constraints on the DNS solution are different for the local-part (LHS) of email. That may point to a different solution. The two important examples that occur to me include: (i) Single-turnaround UDP transactions versus multistep, TCP-based, SMTP. Because of the use of UDP, and some other things, there is no possibility for the DNS client and the server to interact about capabilities, form of names, etc. While multi-hop (relay) transmissions of mail complicates things, we do precisely those types of negotiations in SMTP all the time. In some respects, almost every mail command is such a negotiation. But, more important, we've got well-established (and, by now, widely deployed) precedents that permit clients to send extended materials only to servers that are prepared to deal with them. That is a feature, not a bug. It is not something we can do with the DNS, which was one of the major reasons why IDNA was needed. (ii) An established DNS syntax versus an opaque local-part For the DNS, the syntax of names used by client and server is fairly well specified. While, in theory, a client could somehow use, e.g., slashes or equal signs to separate labels in presentation, but parse things appropriately for DNS queries and responses, in practice it just doesn't happen. And someone might be able to read the specifications to prohibit such variations in a lot of cases. With email, the idea of a completely opaque local-part -- one that can be internally parsed, decomposed, and interpreted only by the delivery MTA -- is required by existing standards. More important, we have taken huge advantage of it over the years to do all sorts of interesting things. Many of the ones that were very important some years ago have fallen into disuse, but it would be, IMO, very unfortunate, and very dangerous, to discard the capability. Of course, if doing so were the only way to get internationalization, that would be worth considering anyway. But it isn't, as a sketch of a counterproposal below will, I hope, demonstrate. There has been a good deal of discussion of subaddresses on this list, but they are certainly not the only case, or even the most important one. We have used trickily-written local-parts to embed routing information (with the %-hack and bang-strings), to express X.400 addresses (MIXER and the UFN work), to implement Dan Bernstein's idea that binds return-paths to target addresses (the idea is interesting and useful, regardless of how the spammers use it), to implement portions of a variety of services whose commands are transported through email, and so on, for a long list of brilliant ideas, cheap hacks, and most points in between. The variety of those techniques is such that, unlike the DNS, we can't reasonably say "no sane person would construct a name such as XN--abcDeF" and then look at enough zone files to increase confidence that no one has. With email, we've had people working for years to find strange combinations and sequences of characters that could safely be used to delimit important (to them) information in different contexts. It would be impossible to claim that every combination has been used, but a plan that requires uniform interpretation of the local-part is in a lot more danger of wrecking something that some group or enterprise cares deeply about than with IDNA (or, more generally, prefixed-ACE) strings in the DNS. Even were "no human would construct such a name" true, it wouldn't help. Some of these systems use a hash of the origin address, or a variant on the message ID, or part or all of the message itself. Incorporating hashed or encoded identifying information in addresses goes back at least to some of the proto-groupware projects of the late 70s and early 80s. And the most aggressive of the "user name doesn't go in the email address because it discloses information" crew have been using random or encrypted strings instead. For years. Remember, too, that case-insensitivity for local names isn't something the protocol calls for. It is a suggestion for those mail servers who support email address that contain people's names or other obvious "name" or "word"-type strings. But the MTAs that support non-case-sensitive names have, historically, done it in a number of ways. I've seen systems in which local-parts such as John.C.Klensin John.Klensin john.klensin klensin J.Klensin J.C.Klensin and so forth, all match --and do so by algorithm, not alias files-- but where johN.klenSin is going to bounce. Maybe that is reasonable, maybe it is perverse, but it is certainly conforming. There are at least a couple of systems and configurations in which a leading underscore on the name implies something like "use it, but don't pass it through aliasing or translation". And a leading tilde means something else -- Ned would remember, but I've blocked it out at the moment. There are many more of these examples. There are enough of them that one just can't give up opacity -- or shouldn't want to-- unless there is no alternative that still gives us internationalized names. The question is not "how character X should be handled", but "can we get away with interpreting the local part in the sending MUA or MTA". And the answer, I suspect, is "no". To put this differently, a prime condition for IDNA was that it not require changes in the protocols or assumptions of the DNS. A stated condition for IMAA is that it not require changes to the mail infrastructure. But weakening or crippling the "opaque local part" rule does as much, or more, violence to the email infrastructure than, e.g., just putting UTF-8 in the addresses. I'll come back to that. Again, if this were the only way to do internationalization, it would be worth considering the risk of breaking things. But it isn't. Let me outline an idea, just to prove that one exists (if it gets any traction, I'd be happy to try to write, or collaborate on, an I-D). (2) A different set of assumptions What we have usually said about email, especially at the mail transport (SMTP) level, is that, if the sender wants to do something that is (historically) unusual, it must get the permission of the receiver first. That permission originally came out of band -- as a private agreement among consenting adults-- and then we introduced ESMTP to provide an in-band permission mechanism that could actually scale. If, as the administrator of an MTA, I want to accept "XN--AbCdEf" as a local-part for one of my mailboxes, I can do that today -- all I need to do is to put such an address in my alias files. And, if my MUA is configured to see it and turn it into a displayable Unicode string, that is my problem or feature and not anyone else's. The problem arises if the sender wants to type in a Unicode (or local CCS) string in the hope of getting the right one to me. And that is a mess, with a large selection of loose ends and tradeoffs, as 250 or so messages in less than five days attests. But we don't need to do it. With one change --and a different way of looking at the problem-- International local parts do not need to be handled differently from ASCII ones. That is how, IMO, it should be. It is is not, we either break existing mail conventions, or put non-ASCII users at a permanent disadvantage relative to ASCII ones. Those two options, to use a technical term, stink. So, instead, let's assume * We want to simply move to Unicode strings in local-parts. * There isn't really a lot of point in sending off strangely-encoded addresses to systems that can't handle them. Gibberish is gibberish, and users don't much like gibberish. If you are going to send me a message in English, and assume I can read English and that you might want a reply, you are better off using ASCII addressing. If you are going to send me a message in Klingon... well, I can't render it or read it, and whether you expect me to have an address in Klingon, or want to put Klingon characters in your return address (reverse-path or From: field), is going to be the least of our problems. Indeed, I would _prefer_ that it get dumped by some MTA because, if it doesn't, I want my little antispambot to toss it in the same bin with From: "=?x-unknown?Q?=C1=F6=BE=D0=BB=F7=B4=DE?=" Subject: =?x-unknown?Q?[=B1=A4=B0=ED]=C7=D1=B9=E6 [...] (chosen at random from the most recent bin-addition) * We don't want to destroy the "opaque local part" principle -- even with Unicode local-parts, the sender needs to either know the exact form of the mailbox name or the alias and conversion rules adopted by the recipient MTA. Guessing what those rules are will work sometimes, just as they do in today's ASCII environment. And sometimes it won't... ditto. * Unlike the MIME situation, where we were trying to be sure that users could get multilingual and multimedia messages even if their system mail adminstrators were slow about upgrading, no one is going to get a mailbox that can be reached with a non-ASCII string unless some system administrative process happens. So the upgrade question is rather different, unless people are really worried about ASCII mailbox names (forward paths) but i18n addresses in [2]822-level "From:" and "To:" fields. I don't think that is a major issue, but, if it is, we should probably be looking at RFC2047-like updates. (3) Strawman semi-proposal This is not a proposal, because there are still loose ends (some identified below). But it should be sufficient to act as an existence proof that there is a plausible alternative to IMAA, one that meets the assumptions/ conditions above. (a) We define a new SMTP extension. For purposes of discussion, let's call it UTF8ADDRESSES. Loose end a.1: This probably won't work unless 8BITMIME also works. That may need to be specified. Conversely, perhaps we could extend 8BITMIME so that this became just a parameter somewhere. I haven't thought that through and it doesn't make a lot of difference right now. Loose end a.2: I'm assuming that UTF-8 is the right choice here. It may not be, although we should really pick one, and only one, encoding. But there is no reason that I can see to force an ACE; 8-bit characters should be fine, modulo the limitation in (b.1). If UTF-8 isn't the right answer, make appropriate substitutions elsewhere in this note. Loose end a.3: Because of the opacity requirement, I believe that even IMAA would ultimately require an ESMTP extension and negotiation to work properly. So, in some ways this proposal and IMAA are compatible and complementary if an IMAA ACE string is used rather than UTF-8. I think that would be overkill unless we really like that hammer. (b) If a server advertises UTF8ADDRESSES, the local-part definition is changed so that the characters in local parts are construed as being Unicode in UTF-8 (or whatever is chosen), but are otherwise left essentially unchanged. E.g., the special rules about @ (ASCII 0x40), " (ASCII 0x22), \ (ASCII 0x5C), etc., simply get promoted to U+0040, U+0022, and so on. Loose end b.1: I think UTF-8 helps here, because any occurrence of ASCII characters gets represented as the relevant single octets. So the string is fairly easy to parse. Other codings would need to ensure that there was no confusion with anything 2821 thinks is a delimiter. Loose end b.2: some tidying will have to be done to the 2822[bis] text to make this work, but I don't think it is rocket science. The 2821[bis] tuning is trivial, since the work would be done in the extension document. (c) The parsing rules don't change. Everything to the left of [unquoted] "@" is the local-part, everything to its right is a domain name. Local-parts are opaque and interpreted only by the delivery MTA, modulo the quoting rules. And they normally don't get interpreted by anything else either. Loose end c.1: Life would be a good deal easier if any sender taking advantage of this feature were flatly prohibited from using source routes. It wouldn't harm anything that I can imagine, and it would make it a lot easier to safely decompose the string. Loose end c.2: I can't see any particular reason why the domain name in this arrangement would be forced to be transmitted in punycode (as an ACE). UTF-8 would probably work as well. Or one might be able to write the spec to permit either punycode or UTF-8 (or whatever) on the RHS. Of course, this doesn't change the requirement for nameprep. Loose end c.3: The opacity principle probably prevents rules about folding, special provisions for full width characters, etc. If those are used, and the delivery MTA doesn't match them up as intended (whatever that means), the mail would be undeliverable (just like today with ASCII). Some mappings/foldings would be wise for receiving MTAs to support, just as case-insensitive ASCII handling is wise, and others would be stupid. We should give advice, but opacity prevents requirements. Nor, IMO, are requirements needed. (d) An originating or relay MTA that received a forward-path address containing non-ASCII characters, but that discovered the next MTA in sequence didn't advertise UTF8ADDRESSES would more or less follow the rules for 8BITMIME relaying. I.e., it would either have to find a valid address (presumably ASCII) it can forward for delivery, or find a routing path that would permit sending the UTF-8 addresses, or bounce the mail because it has no clue how to process it. Loose end d.1: The way such an MTA gets the information needed for those first two options is outside the scope of the standard. That isn't much different from the situation today when a relay MTA gets an address it can't figure out how to parse because some rule or other is violated. And it is better for something that can explain what is happening to bounce the mail than to deliver it to something that might blow up on the 8-bit characters and not return any non-delivery information. Accepting the message and then blowing up of course violates the standard, but that fact and a dollar will get you... (e) The issue of what the delivery MTA actually puts in the mail store, and how the receiving MUA(s) handle that, has never been the subject of Internet protocols and that should probably not change. But, again, we can give advice. It seems to me that a great deal of this week's discussion could usefully be turned into that advice. E.g., "Dear sysadmin, if you configure your MTA so that you have mailboxes whose names contain characters that look like quotes or the at-sign, you are inviting big trouble". It also seems to me that IMAA might well turn out to be (or be easily transformed into) a good delivery MTA -> mailstore or delivery MTA-> final MUA protocol for MUAs that have not been updated. But, again, these are basically local-machine and user interface issues, which are traditionally outside IETF scope. (f) There are some other issues here, but that is the general picture. For example, we would have to _very_ carefully work out what went into reply messages, given that those might hit a host that wasn't prepared to have i18n characters in them. But that is a problem to be worked out, not a showstopper. Comments? john From owner-ietf-imaa Fri Feb 14 17:18:26 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F1IQr28408 for ietf-imaa-bks; Fri, 14 Feb 2003 17:18:26 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F1IOd28404 for ; Fri, 14 Feb 2003 17:18:24 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1F1InKA024695 for ; Sat, 15 Feb 2003 01:18:49 GMT To: duerst@w3.org CC: ietf-imaa@imc.org In-reply-to: <4.2.0.58.J.20030214153013.059a8818@localhost> (message from Martin Duerst on Fri, 14 Feb 2003 15:38:18 -0500) Subject: Re: Question: Fullwidth double-quote and fullwidth backslash References: <4.2.0.58.J.20030214105633.05c158e0@localhost> <4.2.0.58.J.20030214153013.059a8818@localhost> Date: Sat, 15 Feb 2003 01:18:48 +0000 Message-ID: <1045271928.24683.TMDA@moriarty.gnomon.org.uk> From: Roy Badami X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: But are we really required, or do we see it as our goal, to help people avoid some potential typing mistakes in addresses that are, by their length and complexity, not at all user-friendly in the first place? If we're going to support quoting in IMAs then I think my original question is still a valid one. We know that Japanese users (at least those who are not intimately familliar with character set issues) often consider full width and half width characters as equivalent and interchangeable. For this reason, the IDN group chose to accept full with dot as equivalent to half width dot. The IMAA base document suggests doing the same for at-sign, presumably for the same reason. *If* we are going to allow quoting in IMAs (that aren't plain 822 addresses) then it is a reasonable question to pose to the group as to whether the same approach should be taken with the relevent metacharacters. If we're going to constuct a syntax for IMAs that involves double quotes and backslash, then I think making an effort to ensure that these are interpreted correctly by the software is sensible. So I'm not sure I really understand your objection... -roy From owner-ietf-imaa Fri Feb 14 17:12:24 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F1CO028327 for ietf-imaa-bks; Fri, 14 Feb 2003 17:12:24 -0800 (PST) Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F1CNd28323 for ; Fri, 14 Feb 2003 17:12:23 -0800 (PST) Received: from [209.187.148.215] (helo=p3.JCK.COM) by bs.jck.com with esmtp (Exim 4.10) id 18jqsU-000Frq-00 for ietf-imaa@imc.org; Fri, 14 Feb 2003 20:12:26 -0500 Date: Fri, 14 Feb 2003 20:12:26 -0500 From: John C Klensin To: IETF IMAA list Subject: Re: A couple of comments on the open issues... Message-ID: <18991408.1045253546@p3.JCK.COM> In-Reply-To: <20030215001538.GE4500@nicemice.net> References: <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <4.2.0.58.J.20030214110907.05bf4398@localhost> <1045254880.23456.TMDA@moriarty.gnomon.org.uk> <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <4.2.0.58.J.20030214110907.05bf4398@localhost> <20030215001538.GE4500@nicemice.net> X-Mailer: Mulberry/3.0.0 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --On Saturday, 15 February, 2003 00:15 +0000 "Adam M. Costello" wrote: > > Martin Duerst wrote: > >> Is subadressing just something that is done by a few email >> systems locally, or is it something specified in some of the >> email standards? > > As far as I know, there are no standards for any structured > local parts. It's all unofficial conventions. Adam and Martin, Let me see if I can shed a little light on this, independent of the tome I just mailed (which addresses this, and several other, issues in the context of questioning whether IMAA is a reasonable approach and proposing an alternative. There are no standards at all for interpreting the local part. The standard is "no one but the delivery MTA can try to interpret the local part in any way". >> to what extent do we really have to consider it here? > > I'd say we're under no obligation to play nicely with these > unofficial practices, but our goal is to create utility, not > frustration, so we shouldn't immediately dismiss the concern. If your goal is not to break the existing standards, then you are obligated to not assign _any_ special interpretation to anything that appears in the local-part (unless your document applies strictly to the interface between the delivery MTA and the network). That causes these "unofficial practices" --which are standard-conforming-- to take care of themselves. >> Also, using subaddressing seems to be quite popular for >> high-end users. But is it actually very much used by the >> bulk of users (the proverbial hotmail/yahoo/... crowd)? Wrong question, I think. There are all sorts of things that do things when email arrives. Many of them are not "users", but robots and agents. Some handle a _lot_ of messages. And many of them funny characters and encodings in the forward and/or reverse paths as well as message header lines like "subject:". One really doesn't want to break them. >... > So users would not get the full IMAA functionality until the > infrastructure (the MTA) has been upgraded, which is an > undesirable dependence (one that IDNA does not suffer). The > counter-argument is that most users won't care about the > missing functionality. Yep. But you may really screw the users/ robots/ agents/ etc. who do care. Getting internationalization at the cost of reduced functionality for anyone or anything that is now doing something that conforms should be considered only if it is the only way. It isn't. regards, john From owner-ietf-imaa Fri Feb 14 17:39:15 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F1dFx28852 for ietf-imaa-bks; Fri, 14 Feb 2003 17:39:15 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F1dDd28846 for ; Fri, 14 Feb 2003 17:39:13 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1F1dcKA024793 for ; Sat, 15 Feb 2003 01:39:39 GMT To: ietf-imaa@imc.org, roy@gnomon.org.uk CC: ietf-imaa@imc.org In-reply-to: <20030215005836.GG4500@nicemice.net> (ietf-imaa.amc+0@nicemice.net.RemoveThisWord) Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA References: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> <1045238119.22308.TMDA@moriarty.gnomon.org.uk> <1045222662.21247.TMDA@moriarty.gnomon.org.uk> <1045233655.21973.TMDA@moriarty.gnomon.org.uk> <20030215005836.GG4500@nicemice.net> Date: Sat, 15 Feb 2003 01:39:37 +0000 Message-ID: <1045273177.24777.TMDA@moriarty.gnomon.org.uk> From: Roy Badami X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: RFC-821 forbids 3foo.org. If I'd ever spotted that in the past, I'd completely forgotten it. Quick, someone tell 3com that no-one's allowed to send mail to them... :) I propose that we all give up and use X.400 instead... -roy From owner-ietf-imaa Fri Feb 14 17:39:13 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F1dDs28847 for ietf-imaa-bks; Fri, 14 Feb 2003 17:39:13 -0800 (PST) Received: from moriarty.gnomon.org.uk (pc4-cmbg2-5-cust162.cmbg.cable.ntl.com [81.100.86.162]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F1dCd28840 for ; Fri, 14 Feb 2003 17:39:12 -0800 (PST) Received: from moriarty.gnomon.org.uk (roy@localhost [127.0.0.1]) by moriarty.gnomon.org.uk (8.12.3/8.12.3/Debian -4) with ESMTP id h1F1dbKA024778 for ; Sat, 15 Feb 2003 01:39:37 GMT To: ietf-imaa@imc.org, roy@gnomon.org.uk CC: ietf-imaa@imc.org In-reply-to: <20030215005836.GG4500@nicemice.net> (ietf-imaa.amc+0@nicemice.net.RemoveThisWord) Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA References: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> <1045238119.22308.TMDA@moriarty.gnomon.org.uk> <1045222662.21247.TMDA@moriarty.gnomon.org.uk> <1045233655.21973.TMDA@moriarty.gnomon.org.uk> <20030215005836.GG4500@nicemice.net> From: Roy Badami Date: Sat, 15 Feb 2003 01:39:37 +0000 Message-ID: <1045273177.24777.TMDA@moriarty.gnomon.org.uk> X-Delivery-Agent: TMDA/0.65 (Johnstown) X-Primary-Address: roy@gnomon.org.uk Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: RFC-821 forbids 3foo.org. If I'd ever spotted that in the past, I'd completely forgotten it. Quick, someone tell 3com that no-one's allowed to send mail to them... :) I propose that we all give up and use X.400 instead... -roy From owner-ietf-imaa Fri Feb 14 17:49:16 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F1nGD29228 for ietf-imaa-bks; Fri, 14 Feb 2003 17:49:16 -0800 (PST) Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F1nFd29223 for ; Fri, 14 Feb 2003 17:49:15 -0800 (PST) Received: from [209.187.148.215] (helo=p3.JCK.COM) by bs.jck.com with esmtp (Exim 4.10) id 18jrS5-000Fvu-00; Fri, 14 Feb 2003 20:49:13 -0500 Date: Fri, 14 Feb 2003 20:49:13 -0500 From: John C Klensin To: Roy Badami , "ietf-imaa@imc.org" Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA Message-ID: <21197961.1045255753@p3.JCK.COM> In-Reply-To: <1045273177.24777.TMDA@moriarty.gnomon.org.uk> References: <1045222662.21247.TMDA@moriarty.gnomon.org.uk> <1045238119.22308.TMDA@moriarty.gnomon.org.uk> <1045222662.21247.TMDA@moriarty.gnomon.org.uk> <1045233655.21973.TMDA@moriarty.gnomon.org.uk> <20030215005836.GG4500@nicemice.net> <1045273177.24777.TMDA@moriarty.gnomon.org.uk> X-Mailer: Mulberry/3.0.0 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: --On Saturday, 15 February, 2003 01:39 +0000 Roy Badami wrote: > RFC-821 forbids 3foo.org. > > If I'd ever spotted that in the past, I'd completely forgotten > it. Quick, someone tell 3com that no-one's allowed to send > mail to them... :) The rule was changed in RFC 1123, which is what permitted 3COM to make that registration. > I propose that we all give up and use X.400 instead... Great idea. IA4 in addresses, no i18n problems :-) john From owner-ietf-imaa Fri Feb 14 18:53:27 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F2rRr01202 for ietf-imaa-bks; Fri, 14 Feb 2003 18:53:27 -0800 (PST) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1F2rQd01198 for ; Fri, 14 Feb 2003 18:53:27 -0800 (PST) Received: (qmail 76882 invoked by uid 1016); 15 Feb 2003 02:53:56 -0000 Date: 15 Feb 2003 02:53:56 -0000 Message-ID: <20030215025356.76880.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: ietf-imaa@imc.org Subject: Re: A couple of comments on the open issues... References: <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <4.2.0.58.J.20030214110907.05bf4398@localhost> <1045254880.23456.TMDA@moriarty.gnomon.org.uk> <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <1045135861.15010.TMDA@moriarty.gnomon.org.uk> <1045135666.14953.TMDA@moriarty.gnomon.org.uk> <4.2.0.58.J.20030214110907.05bf4398@localhost> <20030215001538.GE4500@nicemice.net> <18991408.1045253546@p3.JCK.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: John C Klensin writes: > There are no standards at all for interpreting the local part. False. RFC 822 clearly states that the local part consists of ASCII characters. The ASCII standard specifies interpretations of bytes 33-126 as glyphs. (If you're under the delusion that this interpretation is not actually standardized, or that users don't rely on it, please switch your MUA to EBCDIC header I/O for a month and let us know the results.) The question of which mailbox names are valid on a system, and how those mailboxes are handled, is (almost) entirely up to the system. But we have a global interpretation of ASCII mailbox names as glyphs. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-ietf-imaa Fri Feb 14 19:02:08 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F328s01430 for ietf-imaa-bks; Fri, 14 Feb 2003 19:02:08 -0800 (PST) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by above.proper.com (8.11.6/8.11.3) with SMTP id h1F328d01426 for ; Fri, 14 Feb 2003 19:02:08 -0800 (PST) Received: (qmail 78290 invoked by uid 1016); 15 Feb 2003 03:02:37 -0000 Date: 15 Feb 2003 03:02:37 -0000 Message-ID: <20030215030237.78289.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: ietf-imaa@imc.org Subject: facts about the real world, part 3 References: <1045235384.22089.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Roy Badami writes: > So this raises the question, do we believe that there are > implementations out there that have quirks we need to work around. Do > we have any idea what the quirks are? Certainly. sendmail screws up quoting in the worst possible way; it does exactly what you accused 822 of doing. If you try sending mail to "A.B" then sendmail won't deliver it to A.B. (Perhaps this has been fixed in new versions, but it's certainly true for a very large fraction of the sendmail installations on the net.) This is why 2822 requires minimal quoting. The general principle, as discussed in http://cr.yp.to/proto/design.html, is that (in the absence of overriding concerns such as efficiency) there should be only one way to encode each object. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago From owner-ietf-imaa Fri Feb 14 19:25:23 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F3PNU01793 for ietf-imaa-bks; Fri, 14 Feb 2003 19:25:23 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F3PMd01788 for ; Fri, 14 Feb 2003 19:25:22 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jsxC-00025F-00; Fri, 14 Feb 2003 19:25:26 -0800 Date: Sat, 15 Feb 2003 03:25:26 +0000 From: "Adam M. Costello" To: "ietf-imaa@imc.org" Cc: Roy Badami Subject: Re: Question: UseSTD3ASCIIRules on RHS of IMA Message-ID: <20030215032526.GH4500@nicemice.net> Reply-To: IETF IMAA list , Roy Badami References: <1045238119.22308.TMDA@moriarty.gnomon.org.uk> <1045222662.21247.TMDA@moriarty.gnomon.org.uk> <1045233655.21973.TMDA@moriarty.gnomon.org.uk> <20030215005836.GG4500@nicemice.net> <1045273177.24777.TMDA@moriarty.gnomon.org.uk> <21197961.1045255753@p3.JCK.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <21197961.1045255753@p3.JCK.COM> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: Warning: This has gotten off topic. This is a rathole, whose only possible purpose is to illustrate that this topic is a rathole and should be avoided in IMAA like it was avoided in IDNA. :) John C Klensin wrote: > The rule was changed in RFC 1123, which is what permitted 3COM to make > that registration. RFC-1123 relaxed the syntax of "host names". RFC-821 refers to its field as a "host name" because when RFC-821 was written, it was always the name of a host. But almost three years before RFC-1123 was published, RFC-974 had generalized the domain part of a mail address to be either the name of a host or the name of an MX record (which is not a host, and RFC-974 always calls it a "domain name", never a "host name"). So when RFC-1123 relaxed the syntax of "host names", did the SMTP field even qualify as a host name anymore? If not, what tells us that the relaxation applied to it? Nothing that I can see. This may have been an innocent oversight, but I can't find any clarification. RFC-2821 tries to have it both ways. On the one hand, it alters the syntax to match the RFC-1123 host name syntax. On the other hand, RFC-2821 states very clearly that names of MX records are *not* host names, which suggests that if the syntax of "host names" were relaxed again, that relaxation would not automatically apply to the syntax of RFC-2821. AMC From owner-ietf-imaa Fri Feb 14 20:00:27 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F40R502443 for ietf-imaa-bks; Fri, 14 Feb 2003 20:00:27 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F40Qd02439 for ; Fri, 14 Feb 2003 20:00:26 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jtV8-00029S-00 for ; Fri, 14 Feb 2003 20:00:30 -0800 Date: Sat, 15 Feb 2003 04:00:30 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Subject: Re: facts about the real world, part 3 Message-ID: <20030215040030.GI4500@nicemice.net> Reply-To: IETF IMAA list References: <1045235384.22089.TMDA@moriarty.gnomon.org.uk> <20030215030237.78289.qmail@cr.yp.to> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030215030237.78289.qmail@cr.yp.to> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: "D. J. Bernstein" wrote: > This is why 2822 requires minimal quoting. Nitpick: Although it recommends that quoting be avoided when it's not needed (so foo.bar is preferred over "foo.bar"), I don't think think it recommends "minimal quoting"; for example, I see no preference for "foo:bar" over "\f\o\o\:\b\a\r". If an implementation can handle the quotation marks, it can probably handle the backslashes too, so I guess there's no need to recommend minimal quoting. AMC From owner-ietf-imaa Fri Feb 14 21:25:31 2003 Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id h1F5PVT03427 for ietf-imaa-bks; Fri, 14 Feb 2003 21:25:31 -0800 (PST) Received: from nicemice.net (arwen.CS.Berkeley.EDU [128.32.132.165]) by above.proper.com (8.11.6/8.11.3) with ESMTP id h1F5PUd03423 for ; Fri, 14 Feb 2003 21:25:30 -0800 (PST) Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 18jupS-0002Le-00; Fri, 14 Feb 2003 21:25:34 -0800 Date: Sat, 15 Feb 2003 05:25:34 +0000 From: "Adam M. Costello" To: ietf-imaa@imc.org Cc: Roy Badami Subject: Re: Question: Fullwidth double-quote and fullwidth backslash Message-ID: <20030215052534.GJ4500@nicemice.net> Reply-To: IETF IMAA list , Roy Badami References: <1045271928.24683.TMDA@moriarty.gnomon.org.uk> <4.2.0.58.J.20030214105633.05c158e0@localhost> <1045254274.23405.TMDA@moriarty.gnomon.org.uk> <8fpKYQTZcDD@3247.org> <8fpJ5$qocDD@3247.org> <1045224428.21358.TMDA@moriarty.gnomon.org.uk> <8fpJ5$qocDD@3247.org> <1045224428.21358.TMDA@moriarty.gnomon.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1045254687.23446.TMDA@moriarty.gnomon.org.uk> <1045271928.24683.TMDA@moriarty.gnomon.org.uk> <1045254274.23405.TMDA@moriarty.gnomon.org.uk> <8fpKYQTZcDD@3247.org> <8fpJ5$qocDD@3247.org> <1045224428.21358.TMDA@moriarty.gnomon.org.uk> User-Agent: Mutt/1.4i Sender: owner-ietf-imaa@mail.imc.org Precedence: bulk List-Archive: List-Unsubscribe: List-ID: This message responds to messages by Roy Badami and Claus Färber. Roy Badami wrote: > When dequoting/requoting localparts, should we consider recognizing > fullwidth double quotes and fullwidth backslash (and any other > double-quote-like and backlash-like characters)? > > It seems to me that the arguments for this are similar to those for > fullwidth dot and fullwidth at, and once we decide to recognize > metacharacters in fullwidth form, we should apply this consistently to > *all* metacharacters. I don't think the arguments are sufficiently similar. For one thing, the dots and at-signs that delimit a mail address are not metacharacters. They are part of the address, and they serve a standard function in all mail addresses in all contexts. Metacharacters are characters that are not actually part of the string they appear in. Examples are quote characters, wildcard characters, macro-expansion characters, etc. The motivating example for requiring the recognition of various dots and at-signs as separators in IDNs and IMAs is this: If I can type an address into my IMAA-aware application and it works, then I expect to be able to type the address into a message body, mail it to you, and have you paste it into your IMAA-aware application, and have it work. We cannot guarantee success, but standardizing the most common dots and at-signs gets us 99% of the way there. But local parts that require quotation are fundamentally more difficult, even with today's ASCII local parts. Although there is a standard quotation mechanism for local parts in message headers and SMTP commands, there is no standard quotation mechanism for user interfaces. Some user agents might copy the user input directly into the header (relying on the user to supply any needed quotation), others might assume the user input is literal and add more quotation if needed, and others might allow users to use some other quotation mechanism altogether, which the agent undoes before applying the 822-style quotation. There's no standard, so we can't expect local parts requiring quotation to be mailable and paste-able, even in today's ASCII world. It would be a wasted effort to try to standardize the Unicode variants of non-standard ASCII metacharacters. > quotes can appear on business cards. They can, but anyone who puts such an address on a business card must not be very concerned about being reachable (for the reasons above). Aside from the futility argument, it would probably be overstepping our authority to try to standardize Unicode variants of metacharacters. It's not hard to imagine that local parts might be found in contexts where dequoting them involves undoing %hex escapes or &ent; escapes. Should we try to insist that fullwidth % and fullwidth & should be recognized as introducing those escape sequences? Of course not, that would almost surely contradict the relevant standards. Claus Färber wrote: > Just do a NFKC normalisation at the very beginning Not before dequoting, for the reason given in the preceeding paragraph. Metacharacters are context-dependent and out of our jurisdiction, and need to be removed before we even have a string to work with. Applying NFKC after dequoting, but before subdividing the local part, is okay. > For IMAA, it suffices to specify that implementations MUST accept > all characters as delimiters that decompose to one of our delimiters > during NFKC-with-U+3002-to-U+002E normalisation and that the > delimiters MUST be normalised. > > The easiest way to implement this is an additional normalisation at > the very beginning. I'm not confident that the first paragraph is exactly equivalent to the second. Normalization is very subtle. If the latter is what you have in mind, it might be best to specify that, and leave it up to the optimizers to prove the existence of a shortcut if there is one. By the way, I'm not sure the CJK community would want ideographic full stop mapped to full stop inside the local part. They might prefer the ability to have genuine ideographic full stops in there. Roy Badami wrote: > Are you saying we can do a normalization of the entire e-mail address > without violating IDNA (which specifies that the domain be split on > dot-like characters before normalization). IDNA requires that normalization happen as part of the proce