Re: UTF-8 and RFC 2047

New Message Reply About this list Date view Thread view Subject view Author view

From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Tue Jul 02 2002 - 15:09:28 CDT


In <3D20B20C.6000609@certplus.com> Jean-Marc Desperrier <jean-marc.desperrier@certplus.com> writes:

>Even if you include the underlying OS, if you choose to set the default
>character set to UTF-8, either by telling a sophisticated newsreader
>that this is the default charset, or by setting LANG to UTF-8 for a
>simple text based unix newreader inside an xterm that can understand
>UTF-8, you will loose and not be able to decode anymore *any* messages
>that is not encoded in UTF-8.

But there are operating syustems out there that do not understand UTF-8 at
all. And even my Solaris 7 system, which does understand it, does not do
so in xterm, because xterm is unchanged since what was written at MIT
umpteen years ago.

>This choice of setting the default to UTF-8 will work, when all messages
>are in UTF-8, but is completely incompatible with the transition period.

Which is why we are recommending that user agents should check for UTF-8,
and revert to the default character set if that check fails. No, existing
user agents do not do that (except the latest Netscape apparently) so the
transition period may be chaotic until people are persuaded to invest in
those new agents. But the current system is already chaotic in that sense.

>>Does OE understand UTF-8 when running under Windows 95/98?
>>
>>
>It can be made to.

OK, Windows 95/98 systems seem to be more capable UTF-8-wise than I had
supposed.

>If you set this default charset to UTF-8, then headers in UTF-8 should
>be displayed correctly in the thread panel.

>The default option of Outlook Express is *not* to send a content-type
>header, and maybe 1% of users know they must change that option.

Yes, there are many options that OE does not set properly (and even some
that are incapable of being set properly). The only solution to that is to
Flame people who are not properly configured, and explain to them that the
newsreader they are using is worth exactly what they paid for it :-( .

>But more, if a message in ISO-8859-1 correctly declares the charset,
>with that setting in the thread list the title will be interpretated as
>UTF-8, but inside the message panel the message title will be displayed
>as ISO-8859-1.

Yes, that is the correct behaviour according to our draft, though testing
for correct UTF-8, as we now recommend, will improve that.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clw.cs.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.