Re: When will News Article Format be approved?

New Message Reply About this list Date view Thread view Subject view Author view

From: Bruce Lilly (blilly@erols.com)
Date: Wed Mar 05 2003 - 08:40:17 CST


Terje Bless wrote:

> ...these aren't the only differences between UTF-8 and RFC2047. There is a
> -- real or imagined, it matters nought -- feeling that UTF-8 is just Yet
> Another Charset whilst RFC2047 is a Transfer Encoding Format.

RFC 2047 isn't just a "Transfer Encoding Format". It provides for
charset and language tagging (as opposed to guess-the-charset and
guess-the-language). UTF-8 is a charset, and as such isn't
usable without the tagging capability of RFC 2047 or something
like it, especially with approx. 100 other 8-bit chasrsets.

> RFC2047 won't go away; it's just that it's turned out to not adequately
> address the needs of a portion of the Netnews population.

You keep chanting the mantra, but have yet to specify what supposed
"needs" aren't satisfied.

> UTF-8 is another
> tack; it may be rejected similarly to RFC2047, but it offers a way to
> achieve the interoperability that _users_ -- as opposed to Ivory Tower
> standards writers (I'm including myself in that characterization so please
> nobody get their panties in a bunch over it! ;D) -- care about.

No, it doesn't; in the first place, it's more guess-the-charset, and
in the second place there's no RFC 2277-compliant means (as least
none proposed so far) for tagging the language. And it's clear
from discussions that _users_ using IMAP -- and there are many of them --
to read messages need a properly-tagged charset for interoperability.
Moreover, raw utf-8 *impedes* interoperability becsuse it does not
work with other protocols -- not just the standards describing those
protocols, but the real implementations that are in use by users every
day. If utf-8 were the panacea it's being hyped as, surely it's use
over the past decade or so would have risen to something more than
0.006%.

Before utf-8 can become a blessed charset, the issue of RFC 2277-compliant
language tagging needs to be addressed. As that's already an established
and widely implemented fact of life for RFC 2047/2231, don't expect
something other than 2047/2231 to be proposed by other standards
groups. Any such scheme is going to have to put standard RFC 3066
language tags, unencoded, somewhere in the field body text, with some
sort of delimiters to mark it as separate from the text strings to
which it applies. And it needs to be parseable compatibly by existing
RFC 822 and 2822 parsers. I.e. it would look something like RFC 2047/2231,
and reinventing the wheel is generally not a productive use of time
or effort. If *you* want to introduce raw utf-8, then *you* should
propose a tagging scheme compliant with 2277 and 3066 -- that is a
necessary prerequisite before raw utf-8 can be used.

RFC 2047/2231 of course can be used with utf-8 as well as with any
other MIME-compatible charset. It's fully compliant with RFCs 3066
and 2277.

Any proposed scheme for language tagging other than RFC 2047/2231
is going to have to clear the same hurdles that 2047/2231 have already
cleared over the past decade, viz. the chicken-and-egg implementation
process and interoperability between separate implementations.
Frankly, I think it's a non-starter because of that, but no matter; the
ball is clearly in the court of those proposing raw utf-8 to come up
with the necessary RFC 2277/3066-compliant language-tagging scheme.


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.