From: Bruce Lilly (blilly@erols.com)
Date: Thu Sep 19 2002 - 11:15:21 CDT
Claus Färber wrote:
> There are a lot of mail<->news
> gateways _today_. Will they correctly convert a header with unencoded
> UTF-8, as it is proposed for news, to RFC 2047/2231, which is needed for
> email?
and an obvious followup is:
If so, if an article arrives as mail with tagged (charset and language)
extended text using 2047/2231, and these are converted to untagged UTF-8
for news, will the nes->mail gateway re-encode with yags (specifically
with the language tag that was present in the original, and with the
same charset as the original)?
and a further followup is:
If the answer to either is no, how do you propose to maintain the
originator's specified charset and language through an end-to-end
pass from mail->news->mail via gateways if the inbound gateway
transforms the header content ro raw untagged 8-bit UTF-8? [clearly,
if the standard MIME 2047/2231 methods are used for headers in news
there would be no change, gateways would have fewer things do worry
about, etc.] Remember in your answer that encoded forms are "too
long" because they use a few bytes more than untagged raw 8-bit
codes, and that has been deemed a "waste of bandwidth", so e.g.
saving the original 2047/2231 header contents in some unspecified
header (which might be dropped by a moderator, injection agent,
relay, or server) is not a viable solution because it would be a
waste of bandwidth to carry both untagged raw 8-bit and tagged
compatible 2047/2231 versions around.
If no viable solution is forthcoming, then it would seem that users
of extended characters in display names in all mailbox-containing
headers, Subject header text, parenthesized comments in all headers,
Comments headers. Keywords headers. and all instances of MIME-style
parameters would best be served by simply requiring that the news
article headers be in compatible 822/2822/2047/2231/1036 format
rather then introducing this new raw UTF-8 change. As the draft
stands, it requires them to be at the mercy of inbound gateway
(and user agent, and posting agent) authors, who, if they elect to
use raw UTF-8, effectively require use of a particular charset
(UTF-8) and prevent specifying the language used in headers. N.B.
822/2822/2047/2231/136 still permits use of utf-8 as a charset
and permits, but does not *require* either specifying or failing
to specify language.