[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: First strawman for UTF-8 headers proposal




At 8:50 PM +0100 11/27/03, Simon Josefsson wrote:
Paul Hoffman / IMC <phoffman@xxxxxxx> writes:

All comments welcome!

This proposal appeals to me. Some comments:


 - The dual motivations are to allow UTF-8 everywhere in the headers and
 to not bounce any messages just because they originated with UTF-8
 headers.
...
 - If a receiving SMTP server does not support UTF-8-HEADERS, the sending
 SMTP client downgrades all headers and continues to send the message.

Following the example of 8BITMIME, I believe implementations should be allowed to bounce messages if they do not implement the fall back mechanism. Otherwise in 20 years, all systems would still be forced to implement a downgrade mechanism that nobody use or test. Users will require that implementors support downgrading today, but eventually they won't have to bother about it.

This sounds good. One thing I didn't say in the first message, which I probably should have, is that it is a fairly-obvious extension of 8BITMIME. All the lessons we have learned in the past decade (!) from 8BITMIME should be applied with whatever I propose here.


 > - Free text fields are downgraded using quoted-printable encoding;
 > SHOULD be into UTF-8 charset. Downgrading MUST only be done if
necessary.

Does this intentionally forbid non-QP RFC 2047 encodings? E.g., strings like =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=.

That was my intention. Maybe it is too drastic.


> - The Address-map: header is downgraded using Base64 for mailbox
names, IDNA for domain names.

 - Example:
    Address-map: José@example.com,jose-old@xxxxxxxxxxx;
        törbjørn@fältström.se,tb@fältström.se
 If passed to a non-UTF-8-HEADERS system, this header gets downgraded
 to:
    Address-map: Sm9zw6k=@xxxxxxxxxxx,jose-old@xxxxxxxxxxx;
        dMO2cmJqw7hybg==@xxxxxxxxxxxxxxxxxxxx,tb@xxxxxxxxxxxxxxxxxxxx

It might be nice to use the RFC 2047 encoding instead, so that the header is rendered properly in MIME aware clients. It would make cut'n'paste of non-ASCII email addresses possible, even from MUAs that doesn't support this new standard. Qualify it to MUST use UTF-8 charset and the "B" encoding if you wish. A possible disadvantage would be if gateways converts RFC 2047 data from one charset to another, although I think the advantages are larger.

As long as we choose UTF-8 for the inner encoding of the QP, that's OK with me. It just seemed like extra characters, but I'm open to either. (But I'm not open to "the inner encoding can be anything" because that leads to the same lack of interop we are battling now.)



> - Other headers that include mailbox names and domain names will need
further definition for downgrading.

Here there may be dragons. There are many headers, standard and non-standard ones, that contain mailboxes, although without using the RFC 2822 BNF 'mailbox'. References: is one. Various List-* headers are others.

Standard ones we can deal with easily (as long as we get all of them). We will need to have a single way of changing non-standard names.


In general, I think the Address-map idea need some further pondering,
especially with regard to modifying in transit and populating them
from address book caches, but also the encoding.

Fully agree.


--Paul Hoffman, Director
--Internet Mail Consortium