[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

8 bit SMTP transport, draft RFC



  Several of us have been working off-list to try to define a protocol
for  8-bit transport that meets at least some set of plausible
objectives without either turning into the "wretched solution" or
duplicating the RFC-XXXX work in the transport layer.  The major
offenders are listed as co-authors, although many other people have
made (sometimes unintentional) contributions to the effort.
  There is a long "architectural" document on its way that addresses
the rationale for the decisions this implies and for the things that
were left out of here; deep philosophical discussions might rationally
be held off until it appears in a week or so.
  The basic theme in what follows has to do with the use of 8-bit
transport among consenting adults, not as a substitute for encoding to
7-bit with RFC-XXXX and transporting in a known interoperable
environment.  To paraphrase one of the candidate remarks for the
architectural document, if the protocol proposed here did not exist,
and I wanted to send 8 bit mail to an unfamiliar host, over unknown
transport paths and mechanisms, I would encode it, set up an RFC-XXXX
message format, and mail it over a 7 bit path.  If this protocol is
approved, I would *still* deal with that situation by encoding, setting
up an RFC-XXXX message format, and mailing over a 7 bit path.   Only
the situation of a familiar host, with a direct virtual connection or
known gateways, is likely to be a candidate for this protocol in the
general case.

 ----------------------------

Network Working Group                           John C. Klensin
Request for Comments: ZZZZ			Risto Kankkunen
Modifies: 821                                    Greg Vaudreuil
					      Craig F. Everhart
DRAFT						     April 1991

  	SMTP Extensions for Transport of Text-Based Messages 
 		Containing 8-bit Characters

Status of this Memo

   This document specifies an extension to the SMTP protocol.  It will
   be submitted to the RFC editor as a proposed standard.  Distribution
   is unlimited.

1. Introduction

RFC 821 [RFC821] defines a protocol, SMTP, to transfer mail reliably
and efficiently. It is largely independent of the transmission
subsystem used.  It requires only a reliable ordered data stream of at
least 7-bit units that consists of "lines" and "characters".   It also
makes some implied assumptions about end-to-end virtual circuit
connections as the primary model for transporting and delivering mail.

SMTP, as described in RFC821, is restricted to the transport of data in
7-bit ASCII encoding.  Strictly speaking, incorporation of any
non-ASCII character encoding, whether 7 or 8 bits, or the assumption of
a special interpretation for any control character other than ASCII CR
and LF is an extension from RFC821.  Such extensions should require
either changes to RFC821 itself, or prior agreement among all parties
and hosts which will transport or handle the mail.  A strict reading of
RFC821 would permit the receiver of a message to assume that it
contained only ASCII characters.

RFC-XXXX provides for identifying and encoding the use of character
sets other than ASCII within a structured message body, using extended
headers. Because that proposal does not require an 8-bit transport
mechanism, it is likely to provide better interoperability than the use
of this 8 bit transport mechanism in situations where mail must be
passed though one or more mail relays, gateways, or exploders between
the sender and the receiver.

At the same time, most electronic mail messages do not pass through
such mechanisms, but are simple textual messages sent point-to-point
within a small, "local" community of users.  Within such local
communities, sending 8-bit character codes as 8-bit character codes,
without additional encoding, provides considerable simplification and
is much in demand.  The consequences of discovering that a receiving
host will not accept 8-bit transport are also not severe, since, within
a local community which has decided to use this protocol extension at
all, that problem will presumably rarely arise.

The strongest evidence for the importance of this feature is that many
vendors and implementors already support 8-bit transport over the usual
SMTP channels and many report that they have done so in response to
intense customer pressure.  Since the mechanisms that have been chosen
have not been standardized, messages containing octets with the high
bit set may "escape" the local environment.  Difficulties of varying
degrees of severity may arise when they do so.  While the primary
purpose of this RFC is to provide for 8-bit transport in conjunction
with SMTP when that is deemed necessary, a critical secondary purpose
is to standardize mechanisms and clarify procedures in ways that
prevent destructive "escape" of improperly-identified 8-bit characters.

In other words, transport of 8-bit characters is occurring, will
continue to occur, and is perceived of as desirable under at least some
circumstances.  For it not to disturb existing implementations, this
feature should be used in a coordinated way and only when both parties
are willing to use it.  That, in turn, requires a clear mechanism as to
how coordination will occur and agreement be verified.  This protocol
extension provides that mechanism.


2.  Notation: 

There are several situations in this proposal in which the bit pattern
associated with the code for a character is, in the event of possible
ambiguity, more significant than the character itself.  In those
situations, the bit pattern is cited (in hexadecimal notation) as the
value of the octet, and the reference ASCII characters are then
indicated in parentheses.  When characters, or character names, are
mentioned, they are to be construed strictly in accordance with ASCII,
that is, from American National Standard ANSI X3.4-1986.  However, for
the purposes of this RFC, the table in ISO 646 [ISO646] and that in
ANSI X3.4 are identical.

Except in a few situations where the distinction is important, the
terms "8-bit characters", "8-bit text", and "8-bit transport" are used
interchangably to refer to messages that might contain octets with the
high bit set to 1.  As above, when the distinction is important, the
term "octet" is used rather than "character" or "byte".  This proposed
protocol does not provide for the transport, or other handling, of
information that is not structured into "characters" and "lines" as
provided for in RFC821 and below.

3. Organization and summary

This proposal consequently contains four logical components, which
follow:

  (i) Definition of a new set of SMTP verbs, EMAL FROM, ESND FROM, ESOM
      FROM, and ESAM FROM as alternatives to MAIL FROM, SEND FROM, SOML
      FROM,  and SAML FROM, and a description of their semantics. 

 (ii) Definition of an optional SMTP verb, EVFY, which can be used to
      determine whether a 8-bit transport request is likely to be
      accepted for a particular address.

 (iii) A discussion of the interaction of 8-bit transport with message
      formats (i.e., RFC822 material).

 (iv) A discussion of error conditions and how they are handled.

 The first, third, and fourth of these sections impose requirements
 which are mandatory if the protocol specified here is implemented.

4. The EMAL FROM and related verbs

The SMTP protocol, as specified in RFC821, is extended to permit the
use of a new set of verbs, that replace the "handling" components of
what we shall refer to as the "FROM" verbs. The relationship is
specified by the table

	MAIL	->	EMAL
	SEND	->	ESND
	SOML	->	ESOM
	SAML	->	ESAM

As now specified in RFC821, DATA is treated as introducing a stream of
ASCII (and therefore 7-bit) characters, divided into lines that are
delimited by the ASCII control characters CR followed by LF, with
potential restrictions on line lengths, and terminated with the
sequence "CR LF . CR LF".

If 8-bit transport is desired, the appropriate FROM verb is replaced by
the extended form (MAIL FROM by EMAL FROM, etc.). If the receiver
doesn't know that verb or doesn't want to receive 8-bit text, it gives
a fatal negative reply.  Such a reply would indicate that the sender
MUST NOT send octets with the high bit turned on.  Otherwise the
receiver sends an "ok" (250) reply (see below) and the sender can
proceed to negotiate the rest of the mail transaction and then send a
message containing 8-bit text after the DATA verb.

When an extended FROM verb is used, any combination of bit values may
appear in any octet of the message data, i.e., both the restriction to
a single character set (ASCII) and the restriction to 7-bit characters
are removed.  However, the data stream MUST still be organized into
lines terminated precisely in octets with the hexadecimal values 13 and
10 (corresponding to ASCII characters CR and LF), in order.  Similarly
the message MUST be terminated in octets with the hexadecimal values
13, 10, 46, 13, 10 (corresponding to ASCII characters CR, LF, ".", CR,
LF).  Any instance of an octet with value 46 (corresponding to ASCII
period, i.e., ".")  to begin a line MUST be doubled by the sender and
the doubling cleared by the receiver, as specified for period in
RFC821.  The same restrictions on line length imposed on DATA in RFC821
are imposed here, with the qualification that "character" should be
read as "octet".

Any coding of message text that will be transported by this protocol
MUST NOT contain any sequence of octets with hexadecimal values of 13
and 10 (corresponding to ASCII CR and LF) in sequence unless they are
intended to be interpreted as line delimiters as described above and in
RFC821, regardless of the character set or coding in use.

All SMTP command verbs, including the extended FROM verbs, are written
in ASCII characters.  Nothing in this RFC provides for any non-ASCII
character or coding to be used in SMTP transactions ("envelope") other
than in the message body initiated by the DATA verb and terminated as
specified above.

5. EVFY command

A new command verb, EVFY is defined, corresponding to VRFY and with the
same argument, but requesting information as to whether the address
appears to be acceptable for 8-bit transport.  Implementation of this
protocol SHOULD provide support for EVFY, but it is not required.  EVFY
has the same reply codes as VRFY, but the successful 250 or 251 codes
are returned only if 8-bit transport will be accepted for that address.
Code 556 must be returned if the address is acceptable, but 8-bit
transport will not be accepted for it.

6. Interaction with the message format and headers.

Both RFC821 and 822 explicitly reference "ASCII" as the character code
in which all text is written and with which it is interpreted.  The
introduction of an eight-bit transport mechanism introduces a potential
ambiguity, since, while there is only one ASCII, there are many 8-bit
character sets and mechanisms.  Hence, when sending a message using
this protocol extension, the message text format MUST conform to
RFC-XXXX.

7. Error conditions

7.1 RFC821 behavior with unrecognized verbs. 

While it is not quite explicit, RFC821 appears to expect that, if a
verb is not recognized by the receiver, it will reject it with a "fatal
error", 5xx code.  Similarly, it appears to specify that, if the sender
receives such a code, it must either abandon the mail message (sending
QUIT or RSET, presumably) or do something else involving the same or a
different verb; it may not simply ignore the 5xx error code and pretend
it was a 2xx (or 354) code.  This RFC depends on that behavioral model.

Consistent with RFC821, we expect that existing SMTP servers will reply
"500 Syntax error; command unrecognized" when any unfamiliar verb is
received.

  Discussion: The material above should probably have made it into
     RFC1123, but some of the issues--particularly the fact that anyone
     could ever have believed that anything else (such as simply
     ignoring 5xx codes) was permitted--have emerged only in the
     process of this investigation.  Nonetheless, this clarification is
     believed to be consistent with existing usage and implementations
     of SMTP.


7.2 Responses when EMAL is recognized.

An SMTP server which does implement this RFC may nonetheless respond to
the EMAL verb or its variations with an error message.  The new code
556 is assigned to this purpose, to be construed as "8 bit transport
not acceptable in this case", and may appear in response to EMAL (or
ESND, ESOM, ESAM) or, more often, in response to one or more of the
RCPT commands.  A sender could appropriately deduce that a 556 error in
response to one of the FROM verbs indicates that 8-bit transport is not
accepted at all, or not accepted from the host specified with the FROM
verb.  A 556 in response to a RCPT verb would indicate that 8-bit
transport is not accepted for that particular address.

A receiving SMTP SHOULD return 556, not 550, if it supports this
protocol and would accept 7-bit mail for the specified address, but
will not accept 8-bit mail for it.

[NOTE IN DRAFT: A case could be made for a 52x code, rather than for
the 55x code used above, on the theory that this is a connection/
transmission channel reply.  Comments solicited.]

If 8-bit transport is accepted, and there is a subsequent delivery
failure that necessitates the generation of a notification message
(RFC1123, section 5.3.3), that message MUST be formatted and
transported using 8-bit transport with EMAL FROM.  However, the error
message text itself should be prepared using ASCII characters and
codings; octets with the high bit set may occur only in any excerpts of
the original message that are returned with the error notification.

   [NOTE IN DRAFT: Comments on this restriction are solicited.  I think
that some rule is needed to govern this case, and came up with the
above to preserve the present SMTP character of ASCII for MTA-MTA
messages, while not trashing the returned text (if any). ]

If 8-bit transport is acceptable, the server should return the normal
responses to the FROM-class commands, the RCPT TO: command, and the
DATA command.  For individual RCPT TO: commands, any of the standard
codes may be returned, and an additional possibility is that the server
may return code 556 in order to refuse to accept 8-bit mail for the
indicated recipient.

7.3 Sender action in response to fatal errors.

The action to be taken by the sender if 500, 556, or any other
500-series code is returned is not specified by this RFC other than in
terms of the limitation that "something else" must be done imposed
above.  In other words, these codes MUST NOT be ignored, and octets
with the high bit turned on MUST NOT be transmitted unless one of the
extended FROM commands has been sent and acknowledged with a 250 code
and a 250 or 251 reply has been received in response to at least one of
the RCPT commands. 

  Discussion: A mail gateway between 8-bit capable environments and
one that could only handle 7-bit transport or characters might, for
example, reasonably attempt automatic conversion to some appropriate
RFC-XXXX encoding.  Even such a gateway may not not have adequate
information for reliably translating between formats, and, in those
situations, the message should be returned undelivered to the
originating user.  So-called bit-stripping MUST NOT be used as an
alternative, since it can introduce significant distortions into the
message, especially when multiple octet character sets are being used,
or the characters in columns 10 through 15 of 8-bit character sets are
not derived from the Roman alphabet.


7.4 Mail relays, mail gateways, and this protocol.

While it is not explicit in RFC 821, there is a general principle that
mail transport facilities should not alter, or even inspect, the
message itself.  There is already a small exception to this in the
requirement for receivers to add "time stamp" ("Received") lines
(RFC821, section 4.1.1, page 21; RFC1123, section 5.2.8).  This
proposal is intended to avoid making further exceptions.

If a mail gateway is used to transform the message from 8-bit transport
form to a 7-bit transport form, the resulting message MUST conform to
the formats specified by RFC-XXXX for 7-bit transport. 

7.4.1 Review of present RFC821 status and requirements.

Under a number of circumstances, an RFC821 SMTP sender implementation
may be called upon to deliver mail, not to a final destination, but to
an intermediary (relay or gateway site) or to the address of a mailing
list exploder.  In some circumstances the sender is aware that it is
dealing with an intermediary; in others it is not.  An intermediate
mail system may not be able to verify, e.g., addresses during the SMTP
negotiation and RFC1123 explicitly provides (section 5.2.7) for
intermediate systems to return "ok" 250 codes for addresses that cannot
be verified, only to send mail messages with error indications back
when addresses fail after the SMTP connection is closed.  Such failure
could occur on the local host (e.g., local list expansion) or remotely
(e.g., in a relay's SMTP negotiation with the next host in sequence).  
Consequently, a mechanism that we might describe as "whoops, that isn't
really something that can be delivered as specified" must already exist
in SMTP server implementations, especially those that operate as relays
or mail gateways.  And, as discussed above, returning messages to users
as undeliverable is an acceptable (and normal) response to a receiver's
rejection of 8-bit transport.

At the same time, mail gateways are permitted to accept one address
from a sender for delivery and then carry out significant
transformations of that address (and even the message) before passing
it along to the actual delivery host, or the next host in sequence.
While RFC821 provides for altering host names (section 3.6) and RFC1123
provides for header, address, and protocol modifications (section
5.3.7), nothing in any Internet standard protocol to date attempts to
completely specify this behavior in the general case.


7.4.2 Relay behavior under this RFC.

The basic model described above for RFC821 is not changed by this
protocol.  A receiver may reject a request for 8-bit transport,
regardless of whether it is coming from the originating host or some
intermediary.  A relay host that accepts 8-bit transport must be
prepared for a host to which it attempts to pass the message to reject
that option, typically by mailing a message back to the originating
user indicating non-deliverability.  Hosts may agree to 8-bit transport
and then "bounce" messages by mailing error indications as specified in
RFC1123, just as they may accept mailbox designations and then bounce
those messages.

The impact of this model is that 8-bit transport will be successful
only between hosts that support 8-bit transport.  If other hosts which
do not support 8-bit transport are involved in forwarding the mail, the
8-bit transport attempt is likely to fail.  Consequently, as outlined
above, the protocol extensions outlined in this RFC are most likely to
be useful within "local" communities of users, and primarily for simple
mail messages between users.  While this RFC makes no recommendation on
the matter, it would be reasonable, for example, for a host that
supports this protocol to refuse to agree to 8-bit transport if one of
the addresses can be identified as a mailing list with non-local users.

7.4.3 Mailing list handling

Receiver SMTP agents may choose to decline a request for 8-bit
transport for recipients whose mail is to be forwarded elsewhere. 

[NOTE IN DRAFT:  The above probably should be either strengthened or
eliminated entirely.]

7.5 Receiving characters with the high octet set without prior
agreement for 8-bit transport.

As mentioned above, sending SMTPs MUST NOT transmit octets with the
high bit non-zero without first successfully negotiating 8-bit
transport with the receiver.  Receivers are not required to enforce
this requirement, but should be sufficiently robust to avoid serious
failure if the requirement is violated by a sender.  If a receiver
detects octets with the high bit set after a DATA command, it SHOULD
reject the message with a 520 error code, indicating an attempt to send
invalid data over the transmission channel.  This message SHOULD NOT be
sent until the terminating CR LF . CR LF is received.  In any event,
senders and receivers MUST NOT impute successful negotiation of 8-bit
transport from the use of DATA with 8-bit characters, since other
requirements may not have been met.

8. References

   [NOTE IN DRAFT: text to be supplied, but these should be obvious at
present.]

[RFC821]

[DNS]

[RFC1123]

[RFCXXXX]  NSB, et al, current version.

[ISO646]

[ISO2022]

[ISO8859]

[X400]

[X500]

9.  Author's Addresses

John C. Klensin
Department of Architecture
Room N52-457
Massachusetts Institute of Technology
Cambridge, MA 02139
USA
 tel: 617 253 1355 (international: +1 617 253 1355)
 fax: 617 491 6266 (international: +1 617 491 6266)
 email: Klensin@MIT.EDU

Risto Kankkunen
Department of Computer Science
University of Helsinki
Teollisuuskatu 23, SF-00510 Helsinki
Finland
 tel: +358 0 708 4209
 fax: +358 0 708 4441
 email: kankkune@cs.Helsinki.FI

Gregory M. Vaudreuil
Corporation for National Research Initiatives
1895 Peston White Drive Suite 100
Reston Va. 22091
USA
 tel: 703 620 8990 (international: +1 703 620 8990)
 fax: 703 620 0913 (international: +1 703 620 0913)
 email: gvaudreuil@nri.reston.va.us

Craig F. Everhart
Transarc Corporation
The Gulf Tower
707 Grant Street
Pittsburgh, PA  15219
USA
 tel: +1 412 338 4467
 fax: +1 412 338 4404
 email: Craig_Everhart@transarc.com