From: Bruce Lilly (blilly@erols.com)
Date: Fri Feb 21 2003 - 20:53:40 CST
Charles Lindsey wrote:
> Clearly, for every person who switches to UTF-8, that is one less person
> sending "guess-the-charset",
No, it's still "guess-the-charset" unless it's tagged.
> The chief problem is, as I think everyone agrees, the Chinese.
I suspect a number of "the Chinese", who vastly outnumber you,
would say that you are the problem...
> 1. If we define that newsgroup-names are in UTF-8, then the Chinese
> _might_ just be persuaded to adopt that; in which case their user agents
> would need to acquire some UTF-8 capability. Note that all current Chinese
> newsgroup-names are still in ASCII. But if we procrastinate for much
> longer they may well start doing newsgroup-names their way. That would be
> bad.
Rather than a newsgroup-name-specific rule, using something
standardized for public names (in the RFC 1958/2277 sense),
like punycode, would have a greater chance of success. For
that matter, the protocol elements need not be internationalized
at all; internationalization support can be implemented via
a protocol-element-to-text-string lookup, which can provide
for mapping newsgroup names to internationalized, language-tagged
text strings one-to-many.
> 2. We are in any case under some pressure to introduce a header of the
> form:
> This-Message-Includes-8bit-Headers: [yes/no]
> as an aid to IMAP, future interoperability with email, and so on. There is
> no reason why that header should not include charset and language
> parameters:
We are under no such pressure. There is a need to be compatible,
but not to implement it via yet another header field. And the
simple fact that Organization, Subject, From, Cc, etc. may well
be in different charsets and/or languages is a very good reason
why such a hypothetical header field is silly. It is also
completely unnecessary as RFC 2047/2231 for text strings plus either
an ACE or lookup for newsgroup names provide internationalization
in an interoperable, backwards-compatible, BCP-compliant manner.