Re: When will News Article Format be approved?

New Message Reply About this list Date view Thread view Subject view Author view

From: Terje Bless (link@pobox.com)
Date: Wed Mar 05 2003 - 01:43:33 CST


Russ Allbery <rra@stanford.edu> wrote:

>Since you all are talking about existing usage, I assume that you're
>talking about all of the headers *other* than Newsgroups.

A distinction I'm not particularly good at maintaining so please excuse me
if I get myself confused in this regard every now and then? :-)

>2 and 3 are the same option. It's called:
>
>2. Standardize an encoding that most people aren't using and try to get
> them to switch.

Not quite. Because...

>The only differences are that the UTF-8 encoding is used essentially
>nowhere and implemented in few places right now, but is somewhat more
>readable for people who don't implement it, whereas the RFC 2047
>encoding is *very* widely implemented but not widely used, and is fairly
>ugly for people who don't implement it.

...these aren't the only differences between UTF-8 and RFC2047. There is a
-- real or imagined, it matters nought -- feeling that UTF-8 is just Yet
Another Charset whilst RFC2047 is a Transfer Encoding Format.

Once you've rejected RFC2047 you're left with "Just Post in an 8bit
Charset". Which charset? Well, there's none that will cover all needs so
we'll just choose one that covers _our_ needs.

Now that Unicode has been introduced into the mix, there _is_ an "8bit"
charset that will cover "our" needs as well as "their" needs.

RFC2047 won't go away; it's just that it's turned out to not adequately
address the needs of a portion of the Netnews population. UTF-8 is another
tack; it may be rejected similarly to RFC2047, but it offers a way to
achieve the interoperability that _users_ -- as opposed to Ivory Tower
standards writers (I'm including myself in that characterization so please
nobody get their panties in a bunch over it! ;D) -- care about.

>The belief of the people favoring the UTF-8 encoding is that it will
>eventually become the native wire encoding of everything and therefore
>will be easier to deal with. This belief seems ill-founded to me, to
>say the least.

You are overstating the case. As regards UTF-8 in From and Subject, it is
my belief that it is possible to convince current users of at least the
ISO-8859-1/Win-1252/MacRoman triumverate to migrate to UTF-8 instead[0].

Despite the choice to use the non-conformant 8bit encoding, there is a real
concern for interoperability; it's just that so far there has been no real
way to achieve it that has been acceptable to the users in question.

A migration to RFC2047 has allready been attempted and has failed (or at
least has failed to be complete). Migrating to RFC2047 is John's #2.

Option #3 leaves current, non-UTF-8, 8bit charsets just as non-compliant as
they have always been (the Stick) but it adds the Carrot of UTF-8 -- which,
apparently, cannot be _less_ tasty and appetizing then RFC2047 as they have
allready tasted /that/ and spit it back out -- if they are willing to exert
themselves just a little bit.

As regards your statement; yes, I do believe that Unicode will eventually
be the native representation of all OSes and applications. I believe that
UTF-8, beeing "forwards compatible" with US-ASCII, will become the native
wire encoding in every context where tagging is impossible or impractical.

If IDN happens to be solved by use of Punycode it will be handled not by
applications, but by "OS" facilities (resolver libraries etc.). The same
cannot with certainty be said for Punycode or RFC2047 in article headers.

[0] - I believe the same to be true for other users of 8bit, but I have
      no real basis for that belief.

-- 
Yes, Micro$oft products work extremely well after you lobotomize yourself,
affect a zombie-like stare, and forever chant the "Micro$oft-knows-best"
mantra until your soul dissolves and you start believing all their crap.


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.