[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Whether 8-bit SMTP? And how?



>    Jon or Dave can probably remember this better than I can, but my 
> recollection is that there was an oral tradition pre-821 and 822 that 
> restricted mail to NVT ASCII, i.e., no control characters other than CR, 
> LF, HT, SP, etc.
>    If I had my druthers, we would [re]institute that restriction, and
> try to sweep up the C1 columns as well as the C0 ones--there is really 
> no need for this stuff in "text mail" and the "binary" situation needs, 
> as has already been established, to be dealt with in other ways.

My feelings on this are decidedly mixed, but at least I think some of
the arguments against filtering are tolerably coherent.

First off, deciding which control characters are to go will be an
unpleasant task.  The Japanese rely on SI and SO to survive the
transfer.  ISO 10646 requires (at its present draft form, at least) that
both announcers and the single graphic character introducer survive, in
addition to the high octet preset.  ISO 2022 requires that you announce
the character set you intend to use, an announcer consisting of ESC and
two or more characters, the first of which chosen from row 2 of ISO 646.
ISO 2022 also has locking shift and single-character shift in addition
the usual shift in/out.  ISO 4873 uses locking shift right to select
G2 and G3 sets.  I'm waiting for my copy of ISO 6429 to have a complete
set of control characters and meaning, but ISO 646 gives a nice rundown
of the meaning of the control characters defined therein.

Let me give a brief summary of the classes of control characters defined
in ISO 646:

	- Transmission control characters
	- Format effectors
	- Code extension control characters
	- Device control characters
	- Information separators
	- Other control characters

I'm inclined to accept the presence of format effectors and code
extension control characters prima facie, and may allow BEL (one of the
"other" control characters), too.  I'm inclined to allow filtering
out NUL and DEL without further consideration.  Device control can go,
too (no need to send XOFF in mail).  Information separators, are they
used?  Transmission control can go.  I notice upon close reading of
ISO 2022 that also EM (1/9) may be used as a shift control character,
a 7-bit represenation of SS2.

The list of acceptable control characters from C0 is thereby reduced to

	BS, HT, LF, VT, FF, CR (format effectors)
	SO, SI, ESC (code extension control characters)
	IS1, IS2, IS3, IS4 (US, RS, GS, FS) (information separators)
	BEL, EM (from other control characters)

Is this an acceptable list to others, too?

Note: ESC may be hazardous, but it's also required as part of ISO 2022
announcer sequences.

I haven't been able to get a full list of assigned meanings to control
characters in C1, so I can't make the same kind of sweeping statements
in that case, pending also the question of applicability.

> I don't know any 
> terminal devices any more that can be caused to literally self-destruct 
> by sending them pathological control sequences.

I do.  My terminal, a Wyse 75 originally, called an Altos II, can kill
its video sync control circuitry when receiving consecutive terminal
resets.  Repairs are in the $200 range.  I was not amused when this was
demonstrated on my Norwegian terminal (saving, fortunately, the US
terminal).  It wasn't in mail, but it could have been.

>    It is an argument for UAs with filters and for restricting what is
> sent to them in order that they not strip anything the sender considers
> important. 

This is the problem spot, and the source of my mixed feelings.  It
could be that all possible bit combinations are important to the sender
in some application.  I don't necessarily think we're talking binary
in those cases, either.

[Erik]