[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: First strawman for UTF-8 headers proposal





On Saturday, November 29, 2003, at 03:43 PM, Paul Hoffman / IMC wrote:

One thing I didn't say in the first message, which I probably should have, is that it is a fairly-obvious extension of 8BITMIME. All the lessons we have learned in the past decade (!) from 8BITMIME should be applied with whatever I propose here.

I'm not sure how much the 8BITMIME experience applies. At the time 8BITMIME was adopted, the Internet was much smaller, there were many fewer UAs, MTAs, and other mail-handling tools (and thus less variety), messages travelled a simpler path (fewer firewalls, spam filters, virus checkers, etc.) and the vast majority of Internet users still spoke English - though this was quickly changing.


Also, 8BITMIME was a much less drastic change than negotiation of UTF-8 would be now. Partially this is because many mail readers in use at the time of 8BITMIME introduction were still intended for use with text terminals or terminal emulators (so MUAs that copied 8bit text to the screen often "did the right thing" even if only by accident, or because the user had configured the terminal emulator to use the right charset). Partially this is because MTAs and other intermediaries that predated 8BITMIME generally did not look at message bodies - they looked only at the headers of messages that transited their systems,. Since headers of 8bit MIME messages are still ASCII supporting 8BITMIME didn't necessarily require any change to a tool's header-parsing code.

One simple example. Bernstein and others have pointed out that it's easier to parse header fields with address lists from the right to the left rather than from the left to the right, because this requires less lookahead. It's still possible to do this with UTF-8 (particularly if you do lexical analysis left-to-right and parsing right-to-left), but it's probably not a trivial change to existing code.