Re: C.T.E. and message/partial

New Message Reply About this list Date view Thread view Subject view Author view

From: J.B. Moreno (planb@newsreaders.com)
Date: Mon Jul 16 2001 - 11:56:06 CDT


On 7/16/01 9:55 AM, Jean-Marc Desperrier at
<jean-marc.desperrier@certplus.com> wrote:

> sommar@kairos.kairos.algonet.se wrote:
>
>> There is no way that a followup agent safely can convert this to UTF-8.
>
> Sorry, I wasn't following the fact you were talking about followup agent.
>
> What experience do you have with followup agent modifying subject ?
> I think most standard agent wont't change anything, and those who will are
> beyond
> any hope of enhancement.

He's probably talking about MacSOUP, and if there was a published standard
specifying UTF8 and raw 8 bit headers, it'd probably adopt it fairly
quickly.

> I don't agree with your assertion that raw 8 bit seldom causes difficulties.
> It works only when everyone communicating uses the same locale.
> I've been confronted to context where people who have different; non-US locale
> try to communicate, and it causes lot of problem.

And that is fairly rare in comparison to the number of post in the various
non-english hierarchies.
 
> I'm worried about the situation after USEFOR gets approved where there will be
> a mix between utf-8 encoded messages, and raw eight bit messages.
> The programs that don't understand RFC 2047 now will not adapt to UTF-8 either
> (within any short time frame at least).

What makes you think that they /need/ to adopt UTF8, whether the time frame
is long or short?

If you can read and post in UTF8, displaying the text correctly is a lesser
matter -- one that doesn't /need/ to be addressed except in so far as it
improves your user experience and thus the likelyhood that you'll have more
users.

> I think we will have basically the following situation :
> - RFC 2047 unable, utf-8 unable programms

Equals: no problem.

> - RFC 2047 able, utf-8 unable programms

Equals: maybe problem, may not be a problem, depending upon how 2047 is
done.

I use two newsreaders more than any others, MacSOUP and Thoth, both
understand 2047, neither understands UTF8. MacSOUP always changes raw 8 bit
Subjects to 2047 encoding using the local charset, Thoth never changes the
Subject -- if it's UTF8 it stays UTF8, if it's 2047 it stays 2047, if it's
something else, it stays that something else. This approach is the most
compatible, and causes the least problem -- and doesn't require the client
to understand /any/ charset issue's, all it needs to know is that if the
Subject is unchanged then it shouldn't do anything but send out the octets
that it received.

If you do that (and it's what should be done) then the only remaining
charset issue is: what do you do with "original" Subjects (either brand new
posts or followups where the Subject is changed by the user). And naturally
those that do 2047 currently will continuing doing so until they are
updated.

> - RFC 2047 able, utf-8 able proagrme
>
> That why I think that the use of RFC 2047 encoded headers instead of switching
> directly to utf-8 of would be a better upgrade path for most user agent (and
> when posting in a group where the group name does not contain any 8 bit
> character), because the number of people who can read it will be, at first,
> higher.

UTF8 is what we are aiming for, no reason to do something inbetween unless
it has a significant gain. I don't see any such gain here.

> Paragraph 3.1
> o The use of the UTF-8 charset for headers will not affect any
> existing _official_ usage, since US-ASCII is a strict subset of UTF-8.
>
> We all know there's an unofficial usage of sending 8 bit in some locale, and
> we know USEFOR will affect _that_ use.
> I think there should be a note like this :
>
> Note : As there has been unofficial and undocumented use in headers fields of
> pure 8 bit in various local encodings before the advent of this standard,
> some newsreaders might choose to try to display illegal utf-8 sequence in the
> headers as character in the local encoding, as far as they are able to
> adequetaly determine local encoding. This should enable newsreaders respecting
> USEFOR standard to interpret messages sent by newsreaders that do not respect
> it, because the redondancy of utf-8 garanties that the probablility of
> non-utf8 sequence to be legal utf-8 is very low.
>
> This kind of behaviour is needed for compatibility with current usage, and how
> to adapt to this current usage should be described to help implementors of
> newsreaders.

I'm not sure exactly what section you are addressing but I think we have a
general statement elsewhere on what to do with 8 bit, non-UTF: clients are
free to do whatever they want, including treat it as the local charset.

(Although this should only apply for /display/ not for re-encoding for out
bound messages).

-- 
J.B. Moreno


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.