[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: handling 8-bit



< I occasionally get a piece of email marked "binary" courtesy the System V
< release 4 rmail MTA.  It decides that a message is binary because it
< contains one or more unprintable characters - either control characters or
< characters with the high bit set.  It then warns me to use an alternate
< "bprint" command, in case I don't want binary stuff printed on my screen.

Actually, it is the mail user agent (MUA) which warns you to use the bprint
command. The MTA is very liberal in what it passes through. The user agent,
however, makes certain that you are aware that the body of the message may
not contain something appropriate for your screen. (It is being
conservative in what it generates.)

< Virtually all of these messages are flagged because somebody fat-fingered
< somewhere and hit a control key or a function key, causing a single
< non-printable character.  Since rmail must work correctly in an
< international environment, portability requires use of the "isprint" macro
< in ctype.h.
<
< I suggest that if we bounce all the mail that inadvertantly has a single
< binary character in it, the cure will be worse than the disease.  A
< message with a small number of unprintable characters is probably ordinary
< text with an accidental typo.  If a default interpretation is to be made,
< I suspect that "text" is far more likely to be appropriate than
< "octet-stream".
<
< I also suggest that requiring all MTA's to be MIME-police and bounce any
< mail with "binary" content (whatever that is - it might be printable on
< both the sending UA and the receiving UA) is a bad idea.  I am reminded of
< the old email proverb:
<	"Be conservative in what you generate, and liberal about
<	what you accept."  (Matt 7:1-2.)

The problem is coming up with a heuristic which can be used to differentiate
between "text" and "octet-stream". Consider what you'd do with the following
cases, as an MTA, as an MUA, and as a MIME-ifier:

	A mostly text message which has an occasional ^[[D sequence (left
	arrow on ANSI terminals).

	A mostly text message that contains a couple hundred ^[[D sequences.

	A mostly text message which has an ^[c or ^[[c sequence (terminal
	reset on many ANSI terminals, hangup and reset on many other
	terminals).

	A mostly text message which happens to have two escape sequence
	embedded within it: one that programs one of my function keys to
	transmit "!rm -rf *" and then another to execute that function key.

	A message which contains lots of lines <1k bytes long, each of which
	contains 3/4 text and 1/4 gibberish.

	A message which contains lots of lines <1k bytes long, each of which
	contains total gibberish.

	A message which contains the executable /bin/ed.

Consider each of the cases where the message is in MIME format, not in MIME
format, and in MIME format but the MIME-ifier screwed up.

If the message is not in MIME format, and we are to be "liberal in what we
accept", how do we convert each of these messages into MIME format? What
claims can we make about the message content?

It the message is in MIME format and marked as 8bit octet-stream, is there a
way in the current smtp proposals to transmit it without forcing it to be
converted to base64 or quoted printable? (I don't think so.)

If each of these messages were marked as text and charset=iso8859-1, which
is obviously false in some of the above cases, what would you do?

No, I don't know the answers.

					Tony Hansen
			    hansen@pegasus.att.com, tony@attmail.com
				att!pegasus!hansen, attmail!tony