[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

if you really want utf-8 headers...




Okay, I still see zero justification for utf-8 headers. The improvement in transmission and storage efficiency is miniscule. They make both user agents and mail transports more complex and less reliable, because MTAs need to have conversion code (which will break messages and cause delivery failures) and UAs need to be able to handle old messages that use RFC 2047 (resulting in multiple code paths and additional failure modes).


(That, and they don't address the problem that this group is trying to solve...)

But if you believe that the very long term benefit of utf-8 headers (by which I mean that whatever benefit might result from using utf-8 - and it's by no means certain - won't be realized for a very long time) somehow outweighs the very high near-term cost, then may I suggest that the place to do the upgrade and negotiation is not in the mail transport, but at the message store and message submission.

That is, the major benefit of using utf-8 headers would be to make life easier for user agents and IMAP servers (for searching). They don't benefit the transport at all. But I could imagine POP and IMAP options that said "give me utf-8 headers instead of headers with RFC 2047 and/or IMAAs in them", and I could imagine simplified UAs that would only talk to POP and IMAP servers that implemented that option.(I'd hate the lack of interoperability between new simplified UAs and old POP and IMAP servers, but there's already some precedent for UAs insisting on nonstandard or optional features in POP and IMAP.)

Message stores could implement this in a variety of ways - they could store the message as received and convert on-the-fly as necessary; they could convert the header to utf-8 on receipt; etc. I could also imagine a SUBMISSION server option that said "translate utf-8 headers to proper on-the-wire format before forwarding them to their destination" and UAs that would only submit messages to SUBMISSION servers that advertised that option via EHLO. Messages sent through SMTP or other transports would still, for the time being, be in ASCII.

I see several "nice" things about doing it this way:
- it isolates the complexity to portions of the system (the message store and submission server) that are "close to" the portions of the system (UAs and message stores) that benefit the most, which means that users who benefit (if they do realize a benefit) will be in a better position to get those portions upgraded.
- it is less disruptive because it affects fewer components of the mail system at once.
- it isolates conversion to a small number of interfaces rather than allowing conversion to potentially occur at any interface between one MTA, gateway, firewall, filter, etc. and another, some of which offer no opportunity for feature negotiation.
- It bounds the number of conversions that a message will undergo, and thus bounds the potential for delivery failure and message corruption.
- it's easy to try on an experimental basis without impacting the infrastructure


And if you also wanted to experiment with transporting utf-8 end-to-end, you could always define a SRV record for "direct utf-8 mail delivery" and have the utf-8 SUBMISSION servers be aware of it, using that in preference to MX. You could even use this as an means to replace SMTP with something simpler, rather than making SMTP more complex.

Keith

p.s. I said something like this in Minneapolis but some amplification might be useful. I still think that even in the long term there's very marginal benefit in going to utf-8 headers as long as we've got so many other baroque irregularities in 2822 and MIME. Of course I understand the potential for second-system effect, but if you just do utf-8 headers without changing anything else in the message format you're paying a lot in upgrade cost to only simplify one fairly minor aspect of the system.

Universal adoption of IMAAs is anything but assured. The largest age group of the world population is fairly young (say, less than 21 years old) . Many of these people have grown up with cheap travel and good communications, and an international popular culture. They are used to dealing with people from other countries, and in multiple languages. Many of these people may find that IMAAs don't benefit them so much and that it's easier to get all email at an ASCII address (or for that matter at an E164 number using ENUM) than it is to deal with IMAAs.

I'm not trying to argue that we shouldn't try to define IMAAs - clearly they will be useful to some people - I'm saying that IMAAs by themselves probably don't justify a vast upgrade to the infrastructure.