Re: C.T.E. and message/partial

New Message Reply About this list Date view Thread view Subject view Author view

From: Dirk Nimmich (nimmich@uni-muenster.de)
Date: Sat Jul 14 2001 - 17:46:48 CDT


Erland Sommarskog wrote:
> However, if the newsreader truly does not know the charset and then
> changes the subject line from:
>
> Subject: Vi gillar räksmörgåsar
>
> To:
>
> Subject: ?iso-8859-1?Q?Vi gillar r=E5ksm=F6rg=E6sar?=
>
> (MIME-encoding made ny hand, may not be completely correct.)
>
> It is actually changing the subject line in violation of a MUST requirement
> in GNKSA, as it changes an unknown charset into a known charset. (Which does
> not prevent newsreaders that do this to get the seal. So much for GKNSA.)

While I too think that most newsreaders get the charset declaration
accidently right: Why shouldn't the author of the response
determine and declare the correct charset for _his_ message?

I disagree, however, with the opinion that different (MIME)
encodings or different folding of a subject constitutes a "subject
change": A transfer encoding does not change the content, only its
representation on the wire.

The same is true for different character encoding schemes (e.g.
same characters like the copyright sign, once declared as ISO
8859-1, once as ISO 8859-2 and another time as UTF-8), but
unfortunately today there are far less newsreaders that can handle
this correctly than those that can deal with MIME in headers.

BTW: Using overview data is worse than all that: any whitespace is
changed to SPACE, which really leads to a content change.

> Particularly, the encoding of a subject line should not be
> changed. If the subject line is encoded according to
> [RFC2047], the followup agent should not change this to
> UTF-8, even if the RFC2047 is deprecated by this standard.

I don't see a reason for this. Subject threading based on a byte by
byte comparison of the on-the-wire representation hasn't worked for
a long time now and probably won't ever work in the future again.
Even if a newsreader remembers that the original subject was
encoded using RFC 2047 it could use another strategy when
re-encoding it.

(Netpick: Not RFC 2047 is deprecated by this standard, its use is.
And I still think this is an error.)

> Likewise, if the followup-agent can conclude that the
> the subject line is not in UTF-8 despite that it contains 8bit
> characters, the followup agent should not make any attempt to guess
> the character set and correct it to UTF-8.

It can display a generic "unknown" character (like a question mark
or a box; I guess this already is standard behaviour with most
reading software) and ask the user if it guessed right or ask him
to correct such unknown characters when following up. I remember it
was you who suggested that users should change their display
configuration if the newsreader did not show any useful, so why do
you think this is no option?

This does not only apply for characters in headers; the same
problem exists for characters in the body. Would you also demand to
post with the same Content-Type declaration as the original
posting, not to speak of Content-Transfer-Encoding? This would not
make much sense, if you ask me.


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.