[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: First strawman for UTF-8 headers proposal
On Saturday, November 29, 2003, at 03:43 PM, Paul Hoffman / IMC wrote:
One thing I didn't say in the first message, which I probably should
have, is that it is a fairly-obvious extension of 8BITMIME. All the
lessons we have learned in the past decade (!) from 8BITMIME should be
applied with whatever I propose here.
I'm not sure how much the 8BITMIME experience applies. At the time
8BITMIME was adopted, the Internet was much smaller, there were many
fewer UAs, MTAs, and other mail-handling tools (and thus less variety),
messages travelled a simpler path (fewer firewalls, spam filters, virus
checkers, etc.) and the vast majority of Internet users still spoke
English - though this was quickly changing.
Also, 8BITMIME was a much less drastic change than negotiation of UTF-8
would be now. Partially this is because many mail readers in use at
the time of 8BITMIME introduction were still intended for use with text
terminals or terminal emulators (so MUAs that copied 8bit text to the
screen often "did the right thing" even if only by accident, or because
the user had configured the terminal emulator to use the right
charset). Partially this is because MTAs and other intermediaries that
predated 8BITMIME generally did not look at message bodies - they
looked only at the headers of messages that transited their systems,.
Since headers of 8bit MIME messages are still ASCII supporting 8BITMIME
didn't necessarily require any change to a tool's header-parsing code.
One simple example. Bernstein and others have pointed out that it's
easier to parse header fields with address lists from the right to the
left rather than from the left to the right, because this requires less
lookahead. It's still possible to do this with UTF-8 (particularly if
you do lexical analysis left-to-right and parsing right-to-left), but
it's probably not a trivial change to existing code.