Re: When will News Article Format be approved?

New Message Reply About this list Date view Thread view Subject view Author view

From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Tue Mar 11 2003 - 05:06:07 CST


In <4.2.0.58.J.20030309181343.03bd0558@localhost> Martin Duerst <duerst@w3.org> writes:

>At 09:48 03/03/05 -0800, Russ Allbery wrote:

>>I'm saying that UTF-8 will still be, for that OS, an *encoding* that has
>>to be undone. You're not actually getting to "just put the bits on the
>>wire," which is the world that proponents of UTF-8 seem to believe that
>>they're going to be living in. You still have to encode and decode the
>>bits, at which point it's just as easy to use RFC 2047 as well.

Unfortunately, both encoding and decoding of RFC 2047 requires a parser
that understands the correct syntax of all the headers encountered in the
message. Yes, you can fudge by without in practical cases (and many
implementations do), but you are nevertheless non-compliant with RFC 2047.

So it is hard to argue that UTF-8 is "just as easy" as RFC 2047.

>There is a huge difference between RFC 2047 with all its special
>rules and UTF-8, which is just a plain character encoding.
>As a very simple example, we can convert a whole file from
>iso-8859-1 to UTF-8. Converting iso-8859-1 to RFC 2047 isn't
>defined at all for a whole file.

And there is another great advantage of ITF-8, which is that in many
commonly arising situations it is not necessary to decode it at all,
because one is only interested in the bits which turn out to be US-ASCII.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.