[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Whether 8-bit SMTP? And how?



On Fri,  1 Mar 1991 23:26:23 +0100, Erik Naggum <erik@naggum.uu.no> 
wrote...
>My feelings on this are decidedly mixed, but at least I think some of
>the arguments against filtering are tolerably coherent.
>
>First off, deciding which control characters are to go will be an
>unpleasant task.  The Japanese rely on SI and SO to survive the
>...transfer.  ISO 10646 requires (at its present draft form, at least) that

Eric, 
   Without reviewing all of your analysis here -- I think you've gotten 
it right, and have done the analyis I was too lazy to try to work out -- 
it seems to me that you've done most of the analysis that would be
needed to define a standard in this area.
Let me give a brief summary of the classes of control characters defined
in ISO 646:

>	- Transmission control characters
>	- Format effectors
>	- Code extension control characters
>	- Device control characters
>	- Information separators
>	- Other control characters

This breakdown is probably close to what one would find in the other 
control character sequences. 

>I'm inclined to accept the presence of format effectors and code
>extension control characters prima facie, and may allow BEL (one of the
>"other" control characters), too.  
   I think this is right.  As a personal bias, I think noise-suppression 
is a terminal function, so agree on BEL (I have seen attempts at abusing 
it in mail).

>I'm inclined to allow filtering
>out NUL and DEL without further consideration.  Device control can go,
>too (no need to send XOFF in mail).  Information separators, are they
>used?  Transmission control can go. 
 My impression is that the information separators are mostly an 
historical artifact, that structured data is typically moved using other 
models.  It isn't clear that they have any value in mail (as distinct 
from embedded binary file transfer).  On the other hand, maybe they
could make good embedding separators :-) :-} 
  The device and transmission controls tend to be where the potential 
problems live.  If one wanted to design a restriction or smart filter 
that would permit the character set manipulations, it would probably be 
sufficient to filter out the C0 device and transmission controls, the C1 
device controls (e.g., CSI sequences and their 7 bit ESC- equivalents) 
and then go back and filter out any sequence starting in ESC that were 
not related to character set switching.  Since, e.g., CSI sequences 
require some moderately complex parsing (it is not just a matter of 
stripping out a character), such a filter, as Eric implies, is not easy 
to write.
  In a way, and probably strongly influenced by my bias for very 
conservative senders over very robust receivers, this is a stronger
argument for writing a "don't send these things" rule than it is for 
expecting receivers (MTAs or UAs) to be very cautious and robust.

  On the other hand, if one does not make/have a strong rule that
requires the sender to severely restrict what is sent... 

>>    It is an argument for UAs with filters and for restricting what is
>> sent to them in order that they not strip anything the sender considers
>> important. 
>
>This is the problem spot, and the source of my mixed feelings.  It
>could be that all possible bit combinations are important to the sender
>in some application.  I don't necessarily think we're talking binary
>in those cases, either.
 ... one could design a receiving UA that would "normally" apply quite 
agressive filtering before displaying a message, but that would support 
options, on a per-message basis, to display the thing "raw" or in 
various intermediate states.   Possibly a good idea (not original, I 
think I've seen one or two that worked this way), and something we might 
recommend somewhere.

It is, however, worth noting that "different character set" can imply 
"different model of control characters", so there may be no precise 
translation of the conventional C0 and C1 characters.  Historically, 
this has been a problem in translation between ANSI/ISO-style character 
sets and, e.g., EBCDIC.  But I gather there are now trends in the ISO 
community toward permitting the kind of identification and switching of 
control character sets (binding different things to CL and CR) that has 
become established practice (via subsets of 2022) with registered 
graphic character sets.  That could make translation of one ISO control 
set to another as difficult as translation of one graphic set to another 
if one did not know what they were.

    john
-------