From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Mon Jul 08 2002 - 05:04:12 CDT
In <20020706201920.25336.qmail@wilhelmina.algonet.se> sommar@algonet.se (Erland Sommarskog) writes:
>"Charles Lindsey" <chl@clw.cs.man.ac.uk> writes:
>> In <yl7kkgxxvf.fsf@windlord.stanford.edu> Russ Allbery <rra@stanford.edu> writes:
>> >Something like:
>>
>> > It is recommended, as a last recourse, that characters in unknown
>> > character sets be passed unaltered and displayed in the default
>> > character set so long as they are not control characters in that
>> > character set. This is better than altering or rejecting the
>> > characters since the user will at least have some chance of making
>> > sense of the text.
>>
>If an IETF reviewer does not understand the difference, we just bang
>him on the head with the approriate RFC in
>> and they MAY, when it is detected that none of these has been used,
>> attempt to interpet the header according to whatever other character
>> set can be deduced, or has been configued as a default by the reader.
>>
>> NOTE: It is possible to determine, with a high degree of
>> accuracy, when a given text containing octets with the 8th bit
>> set was not encoded using UTF-8, and using this test to recover
>> such non-compliant texts is therefore commended where no other
>> harm could arise.
>It still says MAY, which clearly signals to the implementor that he
>can ignore it completely.
Well it does say the practice is "commended".
However, let us bear in mind that all this is for the benefit of some guys
who have been posting articles whose headers are in violation of all known
standards (whether in mail or news), using a convention ("just guess the
charset") which in no way deserves to remain as an ultimate part of
Usenet.
So we have provided a lifeline, a "last recourse" as Russ put it, to buy
some time while they get out of this bad habit. So newsreaders MAY take
advantage of this lifeline, but I would not want to write anything into
the standard which suggests that they SHOULD continue to do so
indefinitely into the future, or that they SHOULD do so if they are only
intended for use in English speaking environments. So we should not use
the word RECOOMMENDED, since it has a defined meaning in RFC 2119,
implying some obligation on implementations which desire to be
fully-compliant. And similarly for "recommended" (which is why I used
"commended").
>Just skip the MAY part, and make Russ's text into a note.
Actually, Russ's words also cover the matter of transporting such
characters, which is already covered elsewhere (relayers MUST either copy
everything or drop it). Moreover, mentioning control characters might just
encourage the mess sendmail has gotten itself into by dropping 0c80-9F.
-- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5