[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: First strawman for UTF-8 headers proposal
At 8:50 PM +0100 11/27/03, Simon Josefsson wrote:
Paul Hoffman / IMC <phoffman@xxxxxxx> writes:
All comments welcome!
This proposal appeals to me. Some comments:
- The dual motivations are to allow UTF-8 everywhere in the headers and
to not bounce any messages just because they originated with UTF-8
headers.
...
- If a receiving SMTP server does not support UTF-8-HEADERS, the sending
SMTP client downgrades all headers and continues to send the message.
Following the example of 8BITMIME, I believe implementations should be
allowed to bounce messages if they do not implement the fall back
mechanism. Otherwise in 20 years, all systems would still be forced
to implement a downgrade mechanism that nobody use or test. Users
will require that implementors support downgrading today, but
eventually they won't have to bother about it.
This sounds good. One thing I didn't say in the first message, which
I probably should have, is that it is a fairly-obvious extension of
8BITMIME. All the lessons we have learned in the past decade (!) from
8BITMIME should be applied with whatever I propose here.
> - Free text fields are downgraded using quoted-printable encoding;
> SHOULD be into UTF-8 charset. Downgrading MUST only be done if
necessary.
Does this intentionally forbid non-QP RFC 2047 encodings? E.g.,
strings like =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=.
That was my intention. Maybe it is too drastic.
> - The Address-map: header is downgraded using Base64 for mailbox
names, IDNA for domain names.
- Example:
Address-map: José@example.com,jose-old@xxxxxxxxxxx;
törbjørn@fältström.se,tb@fältström.se
If passed to a non-UTF-8-HEADERS system, this header gets downgraded
to:
Address-map: Sm9zw6k=@xxxxxxxxxxx,jose-old@xxxxxxxxxxx;
dMO2cmJqw7hybg==@xxxxxxxxxxxxxxxxxxxx,tb@xxxxxxxxxxxxxxxxxxxx
It might be nice to use the RFC 2047 encoding instead, so that the
header is rendered properly in MIME aware clients. It would make
cut'n'paste of non-ASCII email addresses possible, even from MUAs that
doesn't support this new standard. Qualify it to MUST use UTF-8
charset and the "B" encoding if you wish. A possible disadvantage
would be if gateways converts RFC 2047 data from one charset to
another, although I think the advantages are larger.
As long as we choose UTF-8 for the inner encoding of the QP, that's
OK with me. It just seemed like extra characters, but I'm open to
either. (But I'm not open to "the inner encoding can be anything"
because that leads to the same lack of interop we are battling now.)
> - Other headers that include mailbox names and domain names will need
further definition for downgrading.
Here there may be dragons. There are many headers, standard and
non-standard ones, that contain mailboxes, although without using the
RFC 2822 BNF 'mailbox'. References: is one. Various List-* headers
are others.
Standard ones we can deal with easily (as long as we get all of
them). We will need to have a single way of changing non-standard
names.
In general, I think the Address-map idea need some further pondering,
especially with regard to modifying in transit and populating them
from address book caches, but also the encoding.
Fully agree.
--Paul Hoffman, Director
--Internet Mail Consortium