[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: draft-klensin-emailaddr-i18n-00



Dan, Adam,

I'm tempted to stay out of this and take a "you and him fight"
approach to this.  But, in the interest of focusing the
discussion, it seems to me that the issue of ease of
implementation and ease of deployment depends entirely on where
in the system one looks, makes predictions or measurements, and
what one cares about.

For example, if one is trying to require changes in the minimum
number of modules, and to have something that can get through,
somehow, to unconverted systems, then Adam is, I think, right.
But Adam's approach requires one other assumption, which is a
"my responsibilities end with getting the message into, across,
and out of the network".  That isn't an irrational position.

But I, and I think you, are postulating a different condition
for success, a much more difficult one.  In my version, if the
end user doesn't see her local characters, and the local
characters of the message, correctly, then, whatever we have
done, we haven't provided internationalization.  Indeed, we may
have made things worse: my Chinese colleagues don't like writing
their names as some transliterated or phonetic ASCII strings,
especially in communicating with each other, but those strings
have significant mnemonic value.  Giving them Chinese characters
would be great, and is the desired target.  But forcing them to
see, or use, or even to cope with some complex, non-Chinese
encoding that has not mnemonic value is probably a step
backwards.  So I put pretty strong value in avoiding that case.

Similarly, one of the things we have observed about email over
the years is that messages that say, somewhere in the body, "my
email address is fooXbar+baz@xxxxxxxxxxxxxxxxx" or, worse,
"George's address is..." are pretty common.   Adam's approach
assumes, I think, that one doesn't need to worry about all of
the issues associated with such embedded addresses in order to
design an email address protocol.  I disagree -- I think those
situations are important in practice, and that having them
"work" had best not depend on either heuristics that search
through a message trying to figure out what is, or is not, an
email address or on the user typing in
funny-encoding@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  The transport
approach says that the email addresses get written into message
bodies in whatever the character set of the message body is.
And, I assume that, if the MUA can't handle the relevant
character set, then, in practice, it is all gibberish anyway and
maybe no one cares.

So, with Adam's constraints and success criteria (at least as
I'm imagining them) the "native characters mapped to UTF-8"
approach is vastly more complicated.  The transport needs
tuning, and everything in the path has to be willing to play to
get i18n addresses through.  The MUA has to be i18n-compliant at
a fairly high standard.  The MTA-MUA interface has to be tuned a
bit.  IMAP and POP need to be adjusted to pass the right stuff
around (a topic that draft-klensin-emailaddr-i18n-01 doesn't
address, but -02 should).  He is absolutely correct: if that has
to be done de nove, it is a _Big_ deal.  Certainly it requires a
lot of protocol changes as well as a lot of code changes if we
were starting from scratch.

However, with my constraints and view of the same problem,
Adam's solution is "finished" only when the users are seeing an
i18n environment.  At that point, a variation of your argument
sets in.  If we conclude that the job hasn't been done until the
infrastructure (not just the MTAs, but also the MUAs, the
presentation layer, etc., are upgraded... Once we conclude that
it is all pointless until those things are upgraded, and
therefore don't count their costs, and one gets to look only at
the marginal cost of address upgrading, then, well that marginal
cost is pretty low.  It would be fair of Adam to argue that is a
very strange way to define the problem or its measurement.  But,
at some level, it is reasonable, too.

So I think, personally, that the important questions are about
the total resources, code changes, deployment costs, etc., are
to get it right --as seen by the end user (which almost
certainly involves seeing "native" characters and little or no
leakage of internal codings).   If one particular approach
doesn't get it right, defined in those user terms, than how much
easier/ cheaper/ faster it can be implemented is really not a
terribly interesting question.  And "right" may be a matter of
user perception or religion, unfortunately.

     john


--On Monday, October 27, 2003 13:17 +0100 Dan Oscarsson
<Dan.Oscarsson@xxxxxxxxxxxxxxx> wrote:

> 
> Adam M. Costello wrote:
> 
>> 
>>> IDNA do not support all international domain names due to
>>> being made to work using unaware DNS servers and clients.
>> 
>> What do you mean by "all international domain names"?  There
>> was no such thing as an "internationalized domain name" until
>> the IETF defined that term.  The definition appears in the
>> IDNA spec.  Therefore, by definition, IDNA supports "all
>> internationalized domain names".  You must have some other
>> definition in mind.  What is it, and why is it a problem that
>> IDNA does not support that definition?
>> 
>>> As the IMAA draft stands today it will not handle all e-mail
>>> addresses.
>> 
>> Same question.  There is no such thing as an
>> internationalized mail address until we define it.
> 
> By international domain names I mean a domain name containg
> non-ASCII characters. The same for e-mail addresses.
> 
> The problem with IDNA and IMAA is that the definition is
> defined in terms of an ASCII form and the rules applied to
> converting to ASCII.
> 
> A domain name with mixed case is an international domain name,
> but that is not the definition of IDNA.
> IMAA also make changes to the e-mail address resulting in a
> subset of the possible international e-mail addresses.
> 
> I think a international e-mail addess or domain names should
> be defined in character semantic, not ASCII encoding rules.
> 
>> And UTF-8 headers will not be very popular before enough
>> clients *and* servers can handle it.  If speed of deployment
>> is an issue, it looks pretty clear to me that the "in
>> applications" approach has the edge.
> 
> By fixing my MTA or DNS server I can support both legacy and
> internationalised applications as well as giving transition
> support for application. All new applications do not need to
> handle legacy as the enhanced MTA/DNS server does the
> up/downgrading for it.
> 
>> 
>>> To make handling of UTF-8 text, the standard should require
>>> Unicode normalisation form C (NFC).
>> 
>> In any case, I think it would be good if receivers do not
>> assume that text is already normalized; they should perform
>> normalization whenever they want text to be normalized.  Then
>> it will not be necessary for senders to perform
>> normalization.  The implementation cost is the same whether
>> the code is inside the senders or inside the receivers.
> 
> The implementation cost is much lower it we agree to use the
> same format "on the wire". At each end we only need to
> implement translation between "on the wire format" and
> "internal format".
> By having many "on the wire" formats, the code gets much more
> complex and increases CPU needs and memory needs.
> I think W3C have written some information about "early"
> normalisation. Also as messages may pass many
> systems/applications on the way from sender to destination,
> requiring normalised data results in only sender and possibly
> final recipient (if not wanting to use normalised data) need to
> change the normalisation. All hops between need not do
> additional normalisation work, and if they want to do some
> filtering using some internal format they only have one "on
> the wire" format to convert from.
> 
> So I can clearly see that implementation cost is much lower,
> and system resources is lower, if we only have one "on the
> wire" format.
> 
>    Dan
>