From: Bruce Lilly (blilly@erols.com)
Date: Thu Mar 06 2003 - 11:19:39 CST
Terje Bless wrote:
> No, the crucial difference between UTF-8 and RFC2047 in this regard is that
> the grunt work of the encoding, normalization, etc., will be handled either
> by the OS or by generic libraries for UTF-8 while RFC2047 will, at best, be
> handled by special-purpose libraries for email and news or, at worst, be
> doomed to be reimplemented by each newsreader author.
That's an artificial distinction; one could just as easily refer to
"special-purpose libraries for utf-8" and "generic libraries for
messages".
> And another very significant factor here is that after you've handled the
> RFC2047 encoding, you'll _still_ have to do the Unicode bit if you want to
> be able to handle i18n properly. Unless in among outlawing "8bit" UTF-8 in
> headers we also end up outlawing the sending of RFC2047-encoded UTF-8...
Incorrect; there is no requirement for support of any particular
charset by UAs. Moreover, there's no requirement to handle RFC 2047
decoding (unless one wishes to claim to be MIME-compliant). Now it
may well be highly desirable to decode the RFC 2047 and then further
decode the utf-8 _for display_, but that is completely unnecessary
for the mechanics of handling the header field content. E.g. one
can take an RFC 2047-encoded address from a From header field and
put it into a To or Cc field when composing an email response without
*any* decoding. Raw utf-8 would have to be RFC 2047 encoded, since
that is mandatory for sending email. I.e. use of raw utf-8 necessitates
RFC 2047 support; use of RFC 2047 does not necessitate utf-8 support.
That doesn't mean "outlawing" utf-8, which is certainly a valid charset.
Nor, for that matter are we "outlawing" raw utf-8 in headers -- that
has never been legal.