[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Whether 8-bit SMTP? And how?
On Fri, 1 Mar 1991 23:26:23 +0100, Erik Naggum <email@example.com>
>My feelings on this are decidedly mixed, but at least I think some of
>the arguments against filtering are tolerably coherent.
>First off, deciding which control characters are to go will be an
>unpleasant task. The Japanese rely on SI and SO to survive the
>...transfer. ISO 10646 requires (at its present draft form, at least) that
Without reviewing all of your analysis here -- I think you've gotten
it right, and have done the analyis I was too lazy to try to work out --
it seems to me that you've done most of the analysis that would be
needed to define a standard in this area.
Let me give a brief summary of the classes of control characters defined
in ISO 646:
> - Transmission control characters
> - Format effectors
> - Code extension control characters
> - Device control characters
> - Information separators
> - Other control characters
This breakdown is probably close to what one would find in the other
control character sequences.
>I'm inclined to accept the presence of format effectors and code
>extension control characters prima facie, and may allow BEL (one of the
>"other" control characters), too.
I think this is right. As a personal bias, I think noise-suppression
is a terminal function, so agree on BEL (I have seen attempts at abusing
it in mail).
>I'm inclined to allow filtering
>out NUL and DEL without further consideration. Device control can go,
>too (no need to send XOFF in mail). Information separators, are they
>used? Transmission control can go.
My impression is that the information separators are mostly an
historical artifact, that structured data is typically moved using other
models. It isn't clear that they have any value in mail (as distinct
from embedded binary file transfer). On the other hand, maybe they
could make good embedding separators :-) :-}
The device and transmission controls tend to be where the potential
problems live. If one wanted to design a restriction or smart filter
that would permit the character set manipulations, it would probably be
sufficient to filter out the C0 device and transmission controls, the C1
device controls (e.g., CSI sequences and their 7 bit ESC- equivalents)
and then go back and filter out any sequence starting in ESC that were
not related to character set switching. Since, e.g., CSI sequences
require some moderately complex parsing (it is not just a matter of
stripping out a character), such a filter, as Eric implies, is not easy
In a way, and probably strongly influenced by my bias for very
conservative senders over very robust receivers, this is a stronger
argument for writing a "don't send these things" rule than it is for
expecting receivers (MTAs or UAs) to be very cautious and robust.
On the other hand, if one does not make/have a strong rule that
requires the sender to severely restrict what is sent...
>> It is an argument for UAs with filters and for restricting what is
>> sent to them in order that they not strip anything the sender considers
>This is the problem spot, and the source of my mixed feelings. It
>could be that all possible bit combinations are important to the sender
>in some application. I don't necessarily think we're talking binary
>in those cases, either.
... one could design a receiving UA that would "normally" apply quite
agressive filtering before displaying a message, but that would support
options, on a per-message basis, to display the thing "raw" or in
various intermediate states. Possibly a good idea (not original, I
think I've seen one or two that worked this way), and something we might
It is, however, worth noting that "different character set" can imply
"different model of control characters", so there may be no precise
translation of the conventional C0 and C1 characters. Historically,
this has been a problem in translation between ANSI/ISO-style character
sets and, e.g., EBCDIC. But I gather there are now trends in the ISO
community toward permitting the kind of identification and switching of
control character sets (binding different things to CL and CR) that has
become established practice (via subsets of 2022) with registered
graphic character sets. That could make translation of one ISO control
set to another as difficult as translation of one graphic set to another
if one did not know what they were.