From "Greg Vaudreuil " Thu Dec 27 18:17:02 1990 Flags: 000000000001 Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA15468; Thu, 27 Dec 90 18:16:06 EST From: Greg Vaudreuil Date: Thu, 27 Dec 90 16:08:41 EST Org: Corp. for National Research Initiatives Phone: (703) 620-8990 ; Fax: (703) 620-0913 X-Mailer: Mail User's Shell (6.5 4/17/89) To: ietf-smtp@dimacs.rutgers.edu Subject: Minutes of the IETF 8bit meeting Message-Id: <9012271608.aa09951@NRI.NRI.Reston.VA.US> Below is a draft charter and the minutes for the meeting at IETF. Please send me any changes or additions as soon as possible so I can submitt them to the proceedings editor. Charte SMTP Extentions (smtpext) Charter Chair(s): Gregory Vaudreuil, gvaudre@nri.reston.va.us Mailing Lists: General Discussion: ietf-smtp@dimacs.rutgers.edu To Subscribe: ietf-smtp-request@dimacs.rutgers.edu Description of Working Group: The SMTP extentions working group is chartered to develop extentions to the base SMTP protocol Among the extentions to be considered are 1) elimination of the line length and 7 bit restrictions to allow the sending of binary information, and 2) the definition of specific body parts. Body parts are intended to allow the use of international character sets, the sending of arbitrary binary files. Goals and Milestones: March 90 Rewrite RFC 1154 to include specific types of body parts and encodings. March 90 Write a document for the sending of 8 bit character sets through 7 bit mailers with the TEX-HEX encoding scheme. March 90 Write a document specifying the elimination of line length restrictions and eliminating the 7 bit restrictions in SMTP. July 90 Submit the edited documents as Internet-Drafts December 90 Submit the documents as RFC's 1 CURRENT_MEETING_REPORT_ Reported by Greg Vaudreuil/ CNRI Minutes of December 4, 1990 This meeting began as a Birds of a Feather session to discuss a proposal from Jan Michael Rynning and Jonny Eriksson but soon widened to include broader enhansements to 822 to allow for body barts. Rynning and Eriksson's proposal suggested a mechanism to transmit 8 bit character sets through SMTP. The proposal consisted of 1) eliminating the 7bit restriction in SMTP, and in cases where 8 bit SMTP is not implimented 2) proposing a 7 but encoding for non-8 bit systems called TEX-HEX. The group found the proposal interesting, but primarily as a starting point for a re-examination of several SMTP issues. There was a consensus that the group should work to eliminate the 7 bit and 1000 character per line restrictions in SMTP. This will allow easier sending of binary files. Tom Kessler convinced the group that there was only minor code changes required for sendmail to accept 8 bit ASCII. Kessler further volunteered to author a document describing the changes to the SMTP protocol. A command "ebit" was proposed in the document by Rynning and Eriksson to identify new mailers. The group agreed that this extension should be considered for SMTP. An alternate HELO command could be defined to query a mailer for 8 bit compatibility, such as HELO8. The working group looked at RFC 1154 for encodings. Some felt that the document has short-comings in not differentiating between content and the encoding scheme. Greg Vaudreuil took an action to contact the author as inquire about the state of that document. The working group felt that establishing body parts for 822 mail would be a good thing. An outstanding issue remained concerning the interaction between the various encoding schemes as the 7 or 8 bit transmission systems. Rynning and Eriksson took an action to re-write their proposal for TEX-HEX as a specific encoding and body part to be used with the encoding document. Actions Tom Kessler: Write a document amending RFC 821 to eliminate the line length restriction and the 7 bit restriction. Greg Vaudreuil: Determine the state of RFC 1154, and encourage the author to join in this effort. Johnny Eriksson and Jan Michael Rynning: Rewrite the TEX HEX encoding document as a specific instance of an RFC 1154 body part. 1 Attendees Robert Braden braden@isi.edu Cyrus Chow cchow@orion.arc.nasa.gov Johnny Eriksson bygg@sunet.se Phillip Gross pgross@nri.reston.va.us Russell Hobby rdhobby@ucdavis.edu Tom Kessler kessler@sun.com Brad Parker brad@cayman.com Michael Roberts roberts@educom.edu Bernhard Stockman boss@sunet.se Jan Rynning jmr@nada.kth.se Dean Throop throop@dg-rtp.dg.com Gregory Vaudreuil gvaudre@nri.reston.va.us David Zimmerman dpz@dimacs.rutgers.edu 2 From "Phillip G. Gross " Fri Dec 28 16:11:42 1990 Flags: 000000000001 Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA05905; Fri, 28 Dec 90 16:10:51 EST Date: Fri, 28 Dec 90 16:09:52 EST X-Mailer: Mail User's Shell (6.5 4/17/89) From: "Phillip G. Gross" To: Greg Vaudreuil , ietf-smtp@dimacs.rutgers.edu Subject: Re: Minutes of the IETF 8bit meeting Cc: pgross@nri.reston.va.us Message-Id: <9012281609.aa08878@NRI.NRI.Reston.VA.US> Greg, Thanks for getting the minutes out. Good job, although I have a couple small nits below (like leaving me out as the convenor of the BOF! :-) You might want to check with Tom Kessler to make sure I correctly characterized his comments about checking the possibly of distributing the SUN Sendmail binaries, and if he is willing to have these comments made available outside the WG (ie, in the Proceedings). BTW, in the future, please review your writing more carefully! "7 but", indeed! Thanks, Phill > Charter > > SMTP Extentions (smtpext) [Perhaps "Internet Mail Extentions" or "RFC 821/822 Extentions" would be more appropriate, since the most interesting aspect of the WG's goals (IMHO) deal with RFC 822, not just SMTP.] > > Chair(s): > Gregory Vaudreuil, gvaudre@nri.reston.va.us > > Mailing Lists: > General Discussion: ietf-smtp@dimacs.rutgers.edu > To Subscribe: ietf-smtp-request@dimacs.rutgers.edu > > Description of Working Group: > > The SMTP extentions working group is chartered to develop > extentions to the base SMTP protocol Among the extentions to be extentions to the basic SMTP protocol (RFC 821) and the format of Internet mail (as defined in RFC 822 "..title..", and proposed to be extended in RFC 1049 "..title.." and RFC 1154 "..title.."). > considered are 1) elimination of the line length and 7 bit > restrictions to allow the sending of binary information, and 2) > the definition of specific body parts. Body parts are intended > to allow the use of international character sets, the sending > of arbitrary binary files. Among the extentions to be considered to SMTP are the elimination of the line length and 7 bit restrictions to allow the sending of binary information. Among the extensions to RFC 822 are the definition of specific standard body parts. Body parts are intended to allow the sending of arbitrary binary files, the sending of structured mail, and the use of alternate encoding of international character sets for mailers that do not understand eight bit characters. > > Goals and Milestones: > > March 90 Rewrite RFC 1154 to include specific types of body parts > and encodings. > March 90 Write a document for the sending of 8 bit character sets > through 7 bit mailers with the TEX-HEX encoding scheme. > March 90 Write a document specifying the elimination of line > length restrictions and eliminating the 7 bit > restrictions in SMTP. > July 90 Submit the edited documents as Internet-Drafts > December 90 Submit the documents as RFC's [How about a bit more detail (eg to include implementations) and to have a slightly different priority (like doing the work next year, instead of last year :-).] 1. By March 91 IETF meeting a. Write a document specifying the elimination of line length restrictions and eliminating the 7 bit restrictions in SMTP. b. Consider revising RFC 1154 to include specific types of body parts and encodings. c. Write a document defining a body part for the sending of 8 bit character sets through 7 bit mailers with the TEX-HEX encoding scheme. 2. By May 1991 Submit the 3 edited documents as Internet-Drafts 3. By (and at) July 1991 IETF Meeting Encourage distribution and deployment of mailers complying with Goal 1a above. Encourage distribution and deployment of mail readers complying with Goals 1b. and 1c. above. 4. By (or at) December 1991 IETF Meeting Finalize the 3 above documents. Submit a recommendation to the IESG to forward the 3 above documents to the IAB and RFC Editor as Proposed Internet Standards. > > 1 > > > CURRENT_MEETING_REPORT_ > > Reported by Greg Vaudreuil/ CNRI > > Minutes of December 4, 1990 > > This meeting began as a Birds of a Feather session to discuss a > proposal from Jan Michael Rynning and Jonny Eriksson but soon widened > to include broader enhansements to 822 to allow for body barts. This meeting began as a Birds-of-a-Feather session organized by Phill Gross (CNRI) to discuss a proposal from Jan Michael Rynning (NORDUnet) and Jonny Eriksson (NORDUnet) for utilizing international character sets in Internet mail (ie, RFC 821/822 mail). Gross had become aware of the Rynning/Eriksson proposal by way of interaction with the Nordic Engineering Task Force (NETF). Rynning and Eriksson in Sweden joined the discussion in Boulder by speakerphone. This speakerphone exercise was interesting to explore a possible way to allow more international participation in IETF WG sessions. Although use of a speakerphone is not optimal for large meetings, it at least enables some level of participation by folks who are unable to attend the meeting, and who would therefore not be able to participate at all. In this case, the speakerphone interaction went fairly smoothly, and we were very pleased to have Rynning and Eriksson join the meeting in this way. The discussion soon widened to include broader enhancements to RFC 822 to allow for general body parts. It was clear that there was enough interest and enthusiam to evolve this BOF into an IETF WG. Greg Vaudreuil (CNRI) kindly volunteered to chair the WG. Gross gladly turned over the gavel. > Rynning and Eriksson's proposal suggested a mechanism to transmit 8 > bit character sets through SMTP. The proposal consisted of 1) > eliminating the 7bit restriction in SMTP, and in cases where 8 bit > SMTP is not implimented 2) proposing a 7 but encoding for non-8 bit SMTP is not implemented 2) proposing a 7 bit encoding for non-8 bit - - > systems called TEX-HEX. > > The group found the proposal interesting, but primarily as a starting > point for a re-examination of several SMTP issues. There was a > consensus that the group should work to eliminate the 7 bit and 1000 > character per line restrictions in SMTP. This will allow easier > sending of binary files. Tom Kessler convinced the group that there sending of binary files. Tom Kessler (SUN) convinced the group that there > was only minor code changes required for sendmail to accept 8 bit > ASCII. Kessler further volunteered to author a document describing the > changes to the SMTP protocol. A command "ebit" was proposed in the changes to the SMTP protocol. A command "EBIT" was proposed in the > document by Rynning and Eriksson to identify new mailers. The group > agreed that this extension should be considered for SMTP. An alternate > HELO command could be defined to query a mailer for 8 bit > compatibility, such as HELO8. compatibility, such as "HELO8". Kessler had already made changes to the SUN Sendmail to eliminate the line length and 7 bit restrictions. He was willing to consider adding whichever of the proposed "EBIT" or "HELO8" SMTP commands that the WG finally decided on. He was also willing to investigate the possibility of SUN allowing the binary distribution of this new version of Sendmail. This was a very exciting prospect because it opened up the possibility fairly rapid deployment of the proposed changes to SMTP (at least for those folks who still use Sendmail). At future meetings, the WG may want to investigate sources of revised versions of other popular mailers, like MMDF. > > The working group looked at RFC 1154 for encodings. Some felt that > the document has short-comings in not differentiating between content > and the encoding scheme. Greg Vaudreuil took an action to contact the > author as inquire about the state of that document. The working group > felt that establishing body parts for 822 mail would be a good thing. > An outstanding issue remained concerning the interaction between the > various encoding schemes as the 7 or 8 bit transmission systems. > > Rynning and Eriksson took an action to re-write their proposal for > TEX-HEX as a specific encoding and body part to be used with the > encoding document. > > Actions > > > Tom Kessler: Write a document amending RFC 821 to eliminate the line > length restriction and the 7 bit restriction. > Greg Vaudreuil: Determine the state of RFC 1154, and encourage the > author to join in this effort. > Johnny Eriksson and Jan Michael Rynning: Rewrite the TEX HEX encoding > document as a specific instance of an RFC 1154 body part. > > 1 > > > > Attendees > > Robert Braden braden@isi.edu > Cyrus Chow cchow@orion.arc.nasa.gov > Johnny Eriksson bygg@sunet.se > Phillip Gross pgross@nri.reston.va.us > Russell Hobby rdhobby@ucdavis.edu > Tom Kessler kessler@sun.com > Brad Parker brad@cayman.com > Michael Roberts roberts@educom.edu > Bernhard Stockman boss@sunet.se > Jan Rynning jmr@nada.kth.se > Dean Throop throop@dg-rtp.dg.com > Gregory Vaudreuil gvaudre@nri.reston.va.us > David Zimmerman dpz@dimacs.rutgers.edu > > > > 2 > From "kessler@hacketorium.eng.sun.com (Tom Kessler)" Wed Jan 2 16:26:44 1991 Flags: 000000000001 Received: from SUN.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17577; Wed, 2 Jan 91 16:25:48 EST Received: from Eng.Sun.COM (exodus-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) id AA03538; Wed, 2 Jan 91 13:25:40 PST Received: from hacketorium.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA19588; Wed, 2 Jan 91 13:25:39 PST Received: by hacketorium.Eng.Sun.COM (4.1/SMI-4.1) id AA00363; Wed, 2 Jan 91 13:24:42 PST Date: Wed, 2 Jan 91 13:24:42 PST From: kessler@hacketorium.eng.sun.com (Tom Kessler) Message-Id: <9101022124.AA00363@hacketorium.Eng.Sun.COM> To: "Phillip G. Gross" Cc: Greg Vaudreuil , ietf-smtp@dimacs.rutgers.edu, pgross@nri.reston.va.us In-Reply-To: <9012281609.aa08878@NRI.NRI.Reston.VA.US> Subject: Re: Minutes of the IETF 8bit meeting I think it would be best to drop the mention of Sun making the binaries available outside of the working group. My management is allowing me to spend a little time to do the same changes to the generic berkeley sendmail release. I believe that I will be able to make the diffs to the Berkeley source publicly available. I probably will not be able to make a release of the Sun binaries available widely do to hassle of having to go through the sun release process plus the usual liability issues. --Tom From "Phillip G. Gross " Wed Jan 2 22:32:42 1991 Flags: 000000000001 Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA26050; Wed, 2 Jan 91 22:30:59 EST Date: Wed, 2 Jan 91 22:30:03 EST X-Mailer: Mail User's Shell (6.5 4/17/89) From: "Phillip G. Gross" To: Tom Kessler , "Phillip G. Gross" Subject: Re: Minutes of the IETF 8bit meeting Cc: Greg Vaudreuil , ietf-smtp@dimacs.rutgers.edu Message-Id: <9101022230.aa14982@NRI.NRI.Reston.VA.US> Tom, Thanks. I think that what you suggest is just as effective. My thanks to SUN for allowing you the time to do that. We should see that SUN gets some kudos for that, if that is agreeable to SUN. Thanks, Phill From "Jan Michael Rynning " Thu Jan 10 09:54:37 1991 Flags: 000000000001 Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA04490; Thu, 10 Jan 91 09:53:45 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA02196; Thu, 10 Jan 91 15:53:15 +0100 Date: Thu, 10 Jan 91 15:53:14 +0100 From: Jan Michael Rynning To: Greg Vaudreuil Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: Minutes of the IETF 8bit meeting In-Reply-To: Your message of Thu, 27 Dec 90 16:08:41 EST Message-Id: Greg Vaudreuil writes in his minutes: > ... A command "ebit" was proposed in the > document by Rynning and Eriksson to identify new mailers. The group > agreed that this extension should be considered for SMTP. An alternate > HELO command could be defined to query a mailer for 8 bit > compatibility, such as HELO8. RFC 821 specifies the command syntax like this: > 4.1.1. COMMAND SEMANTICS > > The SMTP commands define the mail transfer or the mail system > function requested by the user. SMTP commands are character > strings terminated by . The command codes themselves are > alphabetic characters terminated by if parameters follow > and otherwise. The syntax of mailboxes must conform to > receiver site conventions. The SMTP commands are discussed > below. The SMTP replies are discussed in the Section 4.2. Consequently, we can't have commands with non-alphabetic characters, like HELO8, unless we change the syntax. On second thought, it may be a good idea to generalize the proposed EBIT into a do-you-support- this-feature command, let's call it FEAT. To inquire if the receiver can handle 8-bit data, the SMTP sender would then say FEAT EIGHTBIT. Greg Vaudreuil writes in his minutes: > Rynning and Eriksson's proposal suggested a mechanism to transmit 8 > bit character sets through SMTP. The proposal consisted of 1) > eliminating the 7bit restriction in SMTP, and in cases where 8 bit > SMTP is not implimented 2) proposing a 7 but encoding for non-8 bit > systems called TEX-HEX. TEXT-HEX. It's a mixture of plain TEXT and HEX encoded characters. I'm curious: To me ``Rynning and Eriksson's proposal'' means the same as ``Eriksson's proposal and Rynning''. I would have used ``Rynning's and Eriksson's proposal''. Are they both correct English? Is there any difference in meaning? Greg Vaudreuil writes in his minutes: > Jan Rynning jmr@nada.kth.se Jan Michael Rynning jmr@nada.kth.se Some people call me by both my given names. Most people call me by my first name or my initials. From "Greg Vaudreuil " Tue Jan 15 13:49:27 1991 Flags: 000000000001 Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA02821; Tue, 15 Jan 91 13:46:33 EST From: Greg Vaudreuil Date: Tue, 15 Jan 91 13:45:34 EST Org: Corp. for National Research Initiatives Phone: (703) 620-8990 ; Fax: (703) 620-0913 X-Mailer: Mail User's Shell (6.5 4/17/89) To: ietf-smtp@dimacs.rutgers.edu Subject: Welcome and Mail Issues Message-Id: <9101151345.aa11729@NRI.NRI.Reston.VA.US> Howdy. For those folks who have recently joined the mailing list, I am resending the minutes from the first organizational meeting of this group at the last IETF in Boulder. The group focused on two main points, 1) Extensions of SMTP to allow binary files, 2) and standardizing a mechanism for including body parts. At the first meeting, the group discussed the RFC 1154 method for encoding body parts. At the meeting I took an action to contact the authors of 1154 and discuss the mail extensions effort and specifically the encoding method of RFC 1154. I contacted David Robinson, one of the authors of RFC 1154. During the working group meeting, many felt that the RFC 1154 approach was seriously flawed by not specifying a content-type field along with encoding field. The group recognized that a single body type can be encoded several different ways and did not see how to reconcile that with the document. Robinson responded that a basic assumption in RFC 1154 was that SMTP would continue to have a 7 bit restriction, and all body parts would use a 7 bit encodings. In this case, a particular encoding type would imply a content type. The working group has begun with a different assumption, that eight bit smtp implementations will be intermixed with seven bit systems. In this situation, there are at least two different encoding schemes necessary, one for 7 bit and one for 8 bit systems. After by announcement to tcp-ip and IETF, I have received several comments about the specific approach of this working group to multi-media mail and the groups relationship to various X.400 efforts. I encourage a discussion on this topic. My main objective is to standardize on a format and several encodings so user-agents can be deployed. I have heard of 7+ Multi-media mail implementations, a number sufficient to demonstrate a real pent up need. This group would not exist if X.400 was widely deployed or likely to be within the near term. However, I would like to facilitate easy interoperation between the two systems. Comments? Below I have a preliminary list of issues this group should focus on: 1) To what extent should we follow X.400's efforts in currently defined body parts? With whom should the group interact? 2) Should a message encoding imply a content type, or can these be independent axis. Implementation is easier with fewer options. A single encoding with a standard 7bit to 8bit conversion process may be possible, but the conversion may render text unreadable to unmodified UA. Other encoding mechanisms like TEX-HEX maintain at least limited utility between conversions. 3) Some have folks have expressed an interest in maintaining the 7 bit restriction on text, but allowing a binary mode. I personally feel this is doing a disservice to international network users who's character sets are not 7 bit ascii. 4) Should the SMTP extensions include a general purpose option negotiation sequence? Currently the group is talking about a single new command to detect modified 8bit, no line length restricted mailers. Personally I don't see other options needed and would like to avoid the option negotiation. Enough for starters. Greg V. CURRENT_MEETING_REPORT_ Reported by Greg Vaudreuil/ CNRI SMTPEXT Minutes of December 4, 1990 This meeting began as a Birds of a Feather session called by Phill Gross (CNRI) to discuss two SMTP related proposals. Jan Michael Rynning (NORDUnet) and Johnny Eriksson (NORDUnet), participating by telephone, presented a method for transmitting eight bit character sets over SMTP. A proposal for a standard List-Service syntax for the Internet was made by Greg Vaudreuil (CNRI). The discussion broadened a bit and resulted in the formation of a working group to consider enhancements to SMTP and RFC 822 to allow for body parts. Rynning's and Eriksson's proposal suggested a mechanism to transmit 8 bit character sets through SMTP. The proposal consisted of 1) eliminating the 7bit restriction in SMTP, and in cases where 8 bit SMTP is not implemented 2) proposing a 7 bit encoding for non-8 bit systems called TEX-HEX. TEX-HEX is a mixture of plain ASCII TEXT and HEX encoded characters. The group found the proposal interesting, but primarily as a starting point for a re-examination of several SMTP issues. There was a consensus that the group should work to eliminate the 7 bit and 1000 character per line restrictions in SMTP. This will allow easier sending of binary files. Tom Kessler (SUN) convinced the group that there was only minor code changes required for sendmail to accept 8 bit ASCII. Kessler further volunteered to author a document describing the changes to the SMTP protocol. A command "EBIT" was proposed in the document by Rynning and Eriksson to identify new mailers. The group agreed that this extension should be considered for SMTP. An alternate HELO command could be defined to query a mailer for 8 bit compatibility, such as HELO8. The working group looked at RFC 1154 for defining encodings of specific body parts. Some felt that the document has short-comings in not differentiating between content and the encoding scheme. Greg Vaudreuil took an action to contact the author as inquire about the state of that document. The working group felt that establishing body parts for 822 mail would be a good thing. An outstanding issue remained concerning the interaction between the various encoding schemes as the 7 or 8 bit transmission systems. 1 Rynning and Eriksson took an action to re-write their proposal for TEX-HEX as a specific encoding and body part to be used with the encoding document. John Veizadez (Apple) stopped in to the meeting to brief the group about Unicos, a universal text encoding scheme developed at Zerox and Apple. This scheme used 2 octets to represent all known characters. Chris Myers (U-Wash) explained the list service offered by Washington University, and explained may of the features of Bitnet's ListServ. Myers took an action to distribute the listserv document to those in the group who had an interest. The group did not come to a consensus on whether to pursue this topic at this time. Actions Tom Kessler Write a document amending RFC 821 to eliminate the line length restriction and the 7 bit restriction. Greg Vaudreuil Determine the state of RFC 1154, and encourage the author to join in this effort. Johnny Eriksson and Jan Michael Rynning Rewrite the TEX HEX encoding document as a specific instance of an RFC 1154 body part. From "Ned Freed, Postmaster " Tue Jan 15 14:57:57 1991 Flags: 000000000001 Received: from CBROWN.CLAREMONT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA05163; Tue, 15 Jan 91 14:55:54 EST Date: Tue, 15 Jan 1991 11:55 PST From: "Ned Freed, Postmaster" Subject: RFC1154 To: ietf-smtp@dimacs.rutgers.edu Message-Id: X-Envelope-To: ietf-smtp@dimacs.rutgers.edu X-Vms-To: IN%"ietf-smtp@dimacs.rutgers.edu" My big problem with RFC1154 is not in what headers it specifies or what encodings it uses. Stuff like this is of less concern than the fundamental approach that RFC1154 advocates to encapsulation; you can always fix the lack of a particular piece of information by adding it later (and RFC822 certainly is extensible enough to allow such additions). I think the major problem with RFC1154 is that it employs line counting as a strategy for encapsulation. This is a terrible idea. The world is full of mailers that do all kinds of things to message text, including all sorts of different forms of line wrapping. The single most prevalent example of widespread line wrapping is BITNET, where inherent format limitations often force wrapping as the only means of converting messages to an acceptable format (we're talking 80 columns here). There are countless other examples, of course, including all those word processors out there that reformat documents automatically and often without the user noticing. Thus, I believe that if we start using line counting as a common strategy for encapsulation, we're buying into a nightmare situation of obscure bugs, processing problems, and trashed messages. We're also buying into the need to write code that hunts around for shifting boundaries in messages. I for one have better things to do with my development time! The alternative to line counts to delimit bodyparts is the use of special delimiters and character stuffing, aka RFC934. Now, I'm not especially fond of the particular approach that RFC934 advocates. It also needs to have some details tightened up insofar as bodypart-specific header lines (mandatory or optional -- the seem to be mandatory, but the spec never comes out and says so) are concerned. However, it does have the advantage of current widespread use, several other RFCs depend on it (RFC987 is one), and I know of no real technical problems with it. RFC934 tolerates line wrapping quite nicely in most cases. (There are some cases where it will get confused, and these do need to be looked at.) Anyway, if this community thinks that line counting is really the way to go, or has a real positivie reason for wanting to use it, I'll follow along. But I do want to see some discussion of this point before we all buy into RFC1154's approach. Ned From "kessler@hacketorium.eng.sun.com (Tom Kessler)" Wed Jan 16 13:33:33 1991 Flags: 000000000001 Received: from Sun.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00617; Wed, 16 Jan 91 13:30:04 EST Received: from Eng.Sun.COM (exodus-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) id AA09986; Wed, 16 Jan 91 10:29:55 PST Received: from hacketorium.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA00848; Wed, 16 Jan 91 10:29:51 PST Received: by hacketorium.Eng.Sun.COM (4.1/SMI-4.1) id AA03023; Wed, 16 Jan 91 10:28:42 PST Date: Wed, 16 Jan 91 10:28:42 PST From: kessler@hacketorium.eng.sun.com (Tom Kessler) Message-Id: <9101161828.AA03023@hacketorium.Eng.Sun.COM> To: Greg Vaudreuil In-Reply-To: <9101161148.aa07847@NRI.NRI.Reston.VA.US> Subject: SMTP changes stuff Cc: ietf-smtp@dimacs.rutgers.edu I've held off doing the work for a bit in part to see what sort of discussion(s) cranked up. I wonder whether we might want to implement something like an EXTHELO or OPTHELO or HELOOPT command. This command could take as arguments options or extensions which you might wish to support. E.g. you might get something like OPTHELO BINARY BATCH FUNKYOPTION hostname.domain.name A reply could contain information about which options you support (or unknown command if you don't support any). The tricky part being that this would make host names which are the same as any options we define because it's amibguous as to whether BATCH is your name or BATCH is an option. The main advantage to loading the option(s) into the hello command is that you save an extra round trip when you setup the connection. Is it worth going to the trouble to do this, or would folks rather see something like a seperate BINARY (or EBIT or whatever) command. I am interested in what people think about this. --Tom Kessler From "Greg Vaudreuil " Wed Jan 16 14:49:26 1991 Flags: 000000000001 Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA03415; Wed, 16 Jan 91 14:45:55 EST Received: from NRI by NRI.NRI.Reston.VA.US id aa12867; 16 Jan 91 14:36 EST To: Tom Kessler Cc: Greg Vaudreuil , ietf-smtp@dimacs.rutgers.edu Subject: Re: SMTP changes stuff In-Reply-To: Your message of Wed, 16 Jan 91 10:28:42 -0800. <9101161828.AA03023@hacketorium.Eng.Sun.COM> Date: Wed, 16 Jan 91 14:36:01 -0500 From: Greg Vaudreuil Message-Id: <9101161436.aa12867@NRI.NRI.Reston.VA.US> >The main advantage to loading the option(s) into the hello command is that >you save an extra round trip when you setup the connection. Is it worth >going to the trouble to do this, or would folks rather see something like >a seperate BINARY (or EBIT or whatever) command. I would prefer not to do option negotiation, and instead define a set of helo commands that accomplish the same ends. HELO (7 bit mail) HELOBINARY (8 bit mail, no line restriction) HELOBATCH (Batched mail jobs, 8 bit, no line restriction)) All modified mailer doing batch mail should also be modified to accept binary mail at the same time. If a command unrecognized results from any of these commands, a fallback to HELO is reasonable. This is in fact negotiation without the overhead of parsing a line of n options. By eliminating certian non-sensical, or non-optimal combinations the number of HELO commands can be sufficiently small. Greg V. From "kessler@hacketorium.eng.sun.com (Tom Kessler)" Wed Jan 16 16:44:20 1991 Flags: 000000000001 Received: from Sun.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA07306; Wed, 16 Jan 91 16:41:07 EST Received: from Eng.Sun.COM (exodus-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) id AA06886; Wed, 16 Jan 91 13:40:59 PST Received: from hacketorium.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AB25652; Wed, 16 Jan 91 13:40:56 PST Received: by hacketorium.Eng.Sun.COM (4.1/SMI-4.1) id AA03479; Wed, 16 Jan 91 13:39:49 PST Date: Wed, 16 Jan 91 13:39:49 PST From: kessler@hacketorium.eng.sun.com (Tom Kessler) Message-Id: <9101162139.AA03479@hacketorium.Eng.Sun.COM> To: Greg Vaudreuil Cc: Tom Kessler , Greg Vaudreuil , ietf-smtp@dimacs.rutgers.edu In-Reply-To: <9101161436.aa12867@NRI.NRI.Reston.VA.US> Subject: Re: SMTP changes stuff That sounds reasonable to me. That will certainly keep things simple. From stef@nma.com Wed Jan 16 17:39:38 1991 Flags: 000000000001 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA09143; Wed, 16 Jan 91 17:35:17 EST Received: from nrtc.northrop.com by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA16942; Wed, 16 Jan 91 17:35:05 EST Received: from nma.com by nrtc.nrtc.northrop.com id aa25645; 16 Jan 91 14:34 PST Received: from localhost by nma.com id aa10167; 16 Jan 91 13:38 PST To: ietf-smtp@dimacs.rutgers.edu Subject: Re: SMTP changes stuff In-Reply-To: Your message of Wed, 16 Jan 91 10:28:42 -0800. <9101161828.AA03023@hacketorium.Eng.Sun.COM> Reply-To: Stef@ics.uci.edu From: Einar Stefferud Date: Wed, 16 Jan 91 13:38:22 -0800 Message-Id: <10162.664061902@nma> Sender: stef@nma.com It should be trivial to avoid confusion with host names in the arg list by just using some characters in the args that are illegal in any host name.domain...\Stef From "Mark Crispin " Wed Jan 16 17:46:03 1991 Flags: 000000000001 Received: from akbar.cac.washington.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA09358; Wed, 16 Jan 91 17:41:50 EST Received: from tomobiki-cho.cac.washington.edu by akbar.cac.washington.edu (5.65/UW-NDC Revision: 2.21 ) id AA02583; Wed, 16 Jan 91 14:41:40 -0800 Date: Wed, 16 Jan 1991 14:18:47 -0800 (PST) From: Mark Crispin Sender: Mark Crispin Subject: re: SMTP changes stuff To: kessler@hacketorium.eng.sun.com Cc: Greg Vaudreuil , ietf-smtp@dimacs.rutgers.edu In-Reply-To: <9101161828.AA03023@hacketorium.Eng.Sun.COM> Message-Id: I don't care for the idea of fooling around with the HELO command on the grounds that too many sites get it wrong as it is. I believe that SMTP commands should continue to be four characters, for the possible benefit of implementations that use this fact. I also feel that we should use the right tool for the right task. To wit, we should not overload a functionality unnecessarily. We should separate the function of specifying data characteristics from other functionalities which belong at the control level. I propose that we do something along these lines: 1. Control-level functions should be handled by new SMTP commands. I'm not really sure how many of these there are. I don't, for example, understand what the need is for an explicit BATCH control command. 2. Functionalities which identify the form of SMTP data should be options to a new XDAT command. XDAT is similar to DATA, except: a) it takes one or more subcommands identifying the form of the data (e.g. BINARY, 8BIT) b) the data follows such a form. If XDATA is rejected, then the DATA command is used and the message is transmitted as 7-bit data as before. We'll need some encoding scheme and add some magic cookie into the header to identify such a transformation. A smart SMTP server or mailer, upon seeing the magic cookie, will transform it back and flush the cookie before passing it on. It is important to know that the structure of the data is beyond the scope of SMTP. All we're dealing with here is binary vs. text (with possible newline convention differences) and with 7-bit vs. 8-bit (ISO). We're not saying what the data will look like. 3) The structure of data will be specified by a successor to RFC-1154. As an implementor of RFC-1154, I would like to see some extensions here, specifically in the areas of more defined types, more information for things like UUENCODE (is the file Unix compressed? tared? What is the file name without looking at the data??), and a less ambiguous definition of the boundaries between segments. The problem with using "lines" is that a "line" means different things on different operating systems. There is no way on Unix, for example, to distinguish between a bare LF and a new line. I would like the definition of an RFC-1154 "line" to explicitly state that a CR *or* an LF constitute a "line", with the exception that an LF immediately following a CR is not counted. Hopefully this will let texts with possible bare LF's or CR's get transferred without getting line counts messed up. I would also like to see some explicit cookie in the message body between segments, as opposed to a blank line. This is to allow an application to resynchronize in case it finds that a line count is messed up. Regards, -- Mark -- From stef@nma.com Wed Jan 16 23:59:06 1991 Flags: 000000000001 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA18443; Wed, 16 Jan 91 23:55:11 EST Received: from nrtc.northrop.com by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA07222; Wed, 16 Jan 91 23:55:02 EST Received: from nma.com by nrtc.nrtc.northrop.com id aa27056; 16 Jan 91 20:54 PST Received: from localhost by nma.com id aa10635; 16 Jan 91 19:52 PST To: ietf-smtp@dimacs.rutgers.edu Subject: Re: SMTP changes stuff In-Reply-To: Mark Crispin's message of Wed, 16 Jan 91 14:18:47 -0800. Reply-To: Stef@ics.uci.edu From: Einar Stefferud Date: Wed, 16 Jan 91 19:52:03 -0800 Message-Id: <10630.664084323@nma> Sender: stef@nma.com I realize that this may receive an unfavorable greeting, but here it is anyway. I suggest that any SMTP extensions take full account of X.400 standards and allow for the carriage of X.400 P2 BodyParts (84 or 88) and allow for some form of faithful carriage of all the Service Elements from P1 in an appropriate way such that they can be reconstructed for reentry into an X.400 P1 transfer service at the receiving end of an SMTP transfer. I believe that this is simply done by making it easy to use, carry, and recognize X.409/ASN.1 encoded objects. The only requirements that I see are to allow 8bit transmission, and allow for X.400 X.409/ASN.1 objects to be labeled as such, without conflicting with the labeling and encoding of other objects. I see no reason to do anything at this point that would hinder carriage of ASN.1/X.409 objects, when no SMTP Extension encoding decisions have yet been made, and we agree that new encodings and labeling are both needed. It might, but is not a requirement of mine, be useful to consider using ASN.1 encoding for all the new 8bit objects, using the X.400(88) standards. I am not sure I see good reasons to develop and adopt yet another set of encoding rules, unless they are demonstrably better in really significant ways. Whatever is done, it will be a great tragedy if SMTP Extensions are deliberately "hostile" to interworking with X.400. Best...\Stef From "Mark Crispin " Thu Jan 17 00:24:35 1991 Flags: 000000000001 Received: from akbar.cac.washington.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA18936; Thu, 17 Jan 91 00:21:08 EST Received: from tomobiki-cho.cac.washington.edu by akbar.cac.washington.edu (5.65/UW-NDC Revision: 2.21 ) id AA11869; Wed, 16 Jan 91 21:21:03 -0800 Date: Wed, 16 Jan 1991 21:14:51 -0800 (PST) From: Mark Crispin Sender: Mark Crispin Subject: Re: SMTP changes stuff To: Stef@ics.uci.edu Cc: ietf-smtp@dimacs.rutgers.edu In-Reply-To: <10630.664084323@nma> Message-Id: Stef - I'm sure that doing something like this is intended. Actually, in some ways RFC-1154 already does much of this. Please take a look at the RFC; it's a good first stab at the problem. I've written an RFC-1154 implementation. I think you might engender some controversy in forcing ASN.1 encoding for all 8-bit transmission though... -- Mark -- From stef@nma.com Thu Jan 17 04:59:02 1991 Flags: 000000000001 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24397; Thu, 17 Jan 91 04:55:28 EST Received: from nrtc.northrop.com by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA25945; Thu, 17 Jan 91 04:55:17 EST Received: from nma.com by nrtc.nrtc.northrop.com id ac27765; 17 Jan 91 1:54 PST Received: from localhost by nma.com id aa11277; 17 Jan 91 1:38 PST To: Mark Crispin Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: SMTP changes stuff In-Reply-To: Your message of Wed, 16 Jan 91 21:14:51 -0800. Reply-To: Stef@ics.uci.edu From: Einar Stefferud Date: Thu, 17 Jan 91 01:38:15 -0800 Message-Id: <11272.664105095@nma> Sender: stef@nma.com Status: O Hi Mark -- I will be reviewing RFC1154 carefully, since it seems to be what some of you are pushing to upgrade RFC821/RFC822. If you reread my message, you will see that I am very carefully saying that I want to be able to carry ASN.1 encoded objects without complication, but not saying anything about exclusivity for ASN.1. I do suggest however, that you give serious thought to ways to be sure that you have unique identifiers, and it will be important to allow use of the regularly defined and registered X.400 object OIDs. This should be done without requiring a rewrite of the new RFCsmtp for every new oobject that is to be allowed carriage. Best...\Stef From "Robert Ullmann " Thu Jan 17 11:25:56 1991 Flags: 000000000001 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA02253; Thu, 17 Jan 91 11:21:33 EST Received: from Relay.Prime.COM by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00719; Thu, 17 Jan 91 11:21:26 EST Message-Id: <9101171621.AA00719@rutgers.edu> Received: (from user ARIEL) by Relay.Prime.COM; 17 Jan 91 11:23:53 EST To: IETF SMTP list From: Robert Ullmann Subject: comments on the SMTP session protocol Date: 17 Jan 91 11:23:54 EST Hi, I think there are a number of aspects of SMTP that have contributed to its success, but are poorly understood by most users and implementors. It worries me to see proposals that add to or modify the protocol in ways that break some of the (IMHO) most important characteristics. In attempting to explain, I will use a few terms from OSI. (Of course I have studied OSI: I have to know it in order to point out is deficiencies with some credibility :-) Like "SPDU": Session Protocol Data Unit. In internet terms: the "packet" used at the session layer. STMP and RFC822 are both defined in terms of a simple SPDU: a line of ASCII text. SMTP uses a half-duplex exchange of these protocol units, while RFC822 defines an object consisting of an ordered set of the same protocol units. This definition has a number of very clear advantages over, say, ASN.1 or some other encoding: * It is machine-independent: ASCII is the ANSI Standard Code for Information Interchange. "Interchange" being the operative word. Systems using other character sets define conversions to ASCII. * each "packet" is a bounded object, known to fit in a record or buffer that is 1000+n octets for some small, fixed, value of n. [need I repeat that the world is not Unix at this point? OK, I won't :-] * The content and representation of each packet or SPDU is human-readable. Where appropriate (almost everywhere) the actual value of the content is readable. Remember, the machines are here to serve the humans, not the other way around! * The SPDUs are self delimiting (if you count the CRLF as included in the end of the "packet"), both in the protocol and in the objects being moved. The combination of these factors make SMTP implementable in a real sense on any hardware and software architechture. They contribute directly to the simple, easy addition of user interfaces and other application interfaces. I would recommend NOT un-bounding line lengths. This will make implementation substantially more difficult in some (most?) environments, and is not upward compatible with current implementations. I don't think "text" with "lines" that are not bounded is real text anyway: it should probable be encoded somehow. (see later, as-yet-unwritten, messages to this discussion :-) RFC1154 takes the definition of a mail message as an ordered set of SPDUs, (lines, remember?) and specifies a rigorous method of identifying the parts that are each contained object. This does not necessitate changes to the MTAs (Mail Transfer Agents, another ISOmorphism :-). This is crucial in my opinion: we aren't going to get any MTA change rolled out anytime in the 1990's to anything approaching universality; and the MTA should not care (read "MUST NOT interfere with") what is being sent anyway. The only useful change I can see making to the MTAs (and to the definition of the SPDUs) is to define the character code used as ISO8859/1 aka ASCII-8. This will not take as long to roll out: some implementations are "compliant" already, while others are trivial to fix. ("sendmail" can be fixed by the simple deletion of two lines of code.) After all: why clear the "high" bit if you don't have to? This has already been done by some vendors, and has been proven to NOT cause interoperability problems. Messages that go through a 7-bit mailer get the bit cleared, but get delivered. After a while, such mailers will be viewed as broken. Encodings that do a complete transform of the input object (uuencode, btoa, hex, FS, LZJU90 (see another future message ;-)) produce a 6 or 7 bit encoding known to survive various gates and conversions, and thus will survive such "broken" mailers. Best Regards, Rob Ullmann +1 508 620 2800 x1736 Ariel@Relay.Prime.COM From "Nathaniel Borenstein " Thu Jan 17 11:50:50 1991 Flags: 000000000001 Received: from thumper.bellcore.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA02917; Thu, 17 Jan 91 11:47:06 EST Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id for IETF-SMTP@dimacs.rutgers.edu; Thu, 17 Jan 91 11:47:02 EST Received: by greenbush.bellcore.com (4.12/4.7) id for IETF-SMTP@dimacs.rutgers.edu; Thu, 17 Jan 91 11:49:31 est Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Thu, 17 Jan 1991 11:49:28 -0500 (EST) Message-Id: Date: Thu, 17 Jan 1991 11:49:28 -0500 (EST) From: Nathaniel Borenstein To: IETF SMTP list Subject: Re: comments on the SMTP session protocol In-Reply-To: <9101171621.AA00719@rutgers.edu> References: <9101171621.AA00719@rutgers.edu> I think Robert is right -- unbounding line lengths will be a major amount of work, and it is really unncessary. Why? Because one can define a special RFC 1049 content-type for multi-part messages, in which multiple parts are encapsulated in the body using some structure that is NOT sensitive to the line length problems. (Andrew is an existence proof that this can be done.) Obviously we can all live with RFC1154 if and when all the mailers are upgraded to deal with the line length problem, but it seems like a lot of unnecessary work when simpler solutions are available. -- Nathaniel From "Ole Bj|rn Hessen " Thu Jan 17 11:51:13 1991 Flags: 000000000001 Received: from ifi.uio.no by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA02921; Thu, 17 Jan 91 11:47:40 EST Received: from roftaty.ifi.uio.no by ifi.uio.no with SMTP id ; Thu, 17 Jan 1991 17:47:28 +0100 From: Ole Bj|rn Hessen Received: by roftaty.ifi.uio.no ; Thu, 17 Jan 1991 17:47:25 +0100 Message-Id: <9101171647.AAroftaty03383@roftaty.ifi.uio.no> Subject: Re: comments on the SMTP session protocol To: Ariel@relay.prime.com (Robert Ullmann) Date: Thu, 17 Jan 1991 17:47:23 +0100 Cc: IETF-SMTP@dimacs.rutgers.edu In-Reply-To: <9101171621.AA00719@rutgers.edu>; from "Robert Ullmann" at Jan 17, 91 11:23 am I have debugged X.400 messages and transmission between MTAs. I find X.400 an order of magnitude more difficult to debug than RFC822/RFC821. It is hard to find out what's broken and where. I have no problem with transmission of binary *body* parts though. As long as the header and envelope fields are line based. Ole Bjorn. From Rudy.Nedved@rudy.fac.cs.cmu.edu Thu Jan 17 14:22:21 1991 Flags: 000000000001 Received: from RUDY.FAC.CS.CMU.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA07686; Thu, 17 Jan 91 14:13:30 EST Received: from rudy.fac.cs.cmu.edu by RUDY.FAC.CS.CMU.EDU id aa01308; 17 Jan 91 14:12:47 EST To: Greg Vaudreuil Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: SMTP changes stuff In-Reply-To: <9101161436.aa12867@NRI.NRI.Reston.VA.US> Date: Thu, 17 Jan 91 14:12:41 EST Message-Id: <1304.664139561@RUDY.FAC.CS.CMU.EDU> From: Rudy.Nedved@rudy.fac.cs.cmu.edu Greg, I suspect it would not be wise to use the HELO command since most of the command were design using the first 4 characters and it is quite possible that implementations blindly assume it can ignore anything after the first 4 characters. A new command is less dangerous since if the implementation does not understand it then it would give some 500 serious command or worst yet say nothing or die. We have to recognize the diverse implementations from poorly configured sendmails to VMS to "old" mainframes like IBM 3083 to MMDF. -Rudy From "Neil W. Rickert " Thu Jan 17 14:31:11 1991 Flags: 000000000001 Received: from mp.cs.niu.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA08066; Thu, 17 Jan 91 14:27:03 EST Received: from localhost by mp.cs.niu.edu with SMTP id AA22930 (5.65a/IDA-1.4.2.5 for ietf-smtp@dimacs.rutgers.edu); Thu, 17 Jan 91 13:26:59 -0600 Message-Id: <9101171926.AA22930@mp.cs.niu.edu> To: ietf-smtp@dimacs.rutgers.edu Organization: Northern Illinois University, CS department. Subject: Extended SMTP. Date: Thu, 17 Jan 91 13:26:58 -0600 From: "Neil W. Rickert" Has anyone considered using the '.' character for these extensions. The '.' character as the first character of a line is already special. If followed only by white space, it is an EOF marker. If followed by another '.' the the '..' should be replaced by '.' In principle .x at the beginning of a line is illegal for any other values of x. An extended SMTP could define meanings for other values of x. For example: First byte - . second byte - 8 third byte - a count could be used as a code that 8 bit binary data is in use. The count would give the number of bytes of binary data. The CR LF at the end of the line is therefore not part of the 8 bit data. Note that this allows long binary streams to be sent as short records for mail software which requires this. Some escaping of internal CR or LF characters would be needed, but is easy to define. -Neil Rickert From "Timo Lehtinen " Thu Jan 17 15:27:22 1991 Flags: 000000000001 Received: from fuug.fi by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA09590; Thu, 17 Jan 91 15:24:07 EST Received: from sti.UUCP by fuug.fi with UUCP id AA10951 (5.65+/IDA-1.3.5 for ietf-smtp@dimacs.rutgers.edu); Thu, 17 Jan 91 22:21:38 +0200 Received: by sti.fi (5.57/smail2.2/03-05-90) id AA21234; Thu, 17 Jan 91 21:35:10 +0200 Message-Id: <9101171935.AA21234@sti.fi> To: ietf-smtp@dimacs.rutgers.edu Cc: ttl@fuug.fi Subject: A spec on TEX-HEX encoding Date: Thu, 17 Jan 91 21:35:09 O From: Timo Lehtinen > Mar 1991 Write a document for the sending of 8 bit character sets > through 7 bit mailers with the TEX-HEX encoding scheme. Could somebody send me a description of the TEX-HEX encoding algorithm ? Preferable via mail. Thank you in advance ! Timo Lehtinen From "brian@ucsd.edu (Brian Kantor)" Thu Jan 17 15:30:40 1991 Flags: 000000000001 Received: from ucsd.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA09675; Thu, 17 Jan 91 15:26:33 EST Received: by ucsd.edu; id AA00111 sendmail 5.64/UCSD-2.1-sun Thu, 17 Jan 91 12:26:20 -0800 for ietf-smtp@dimacs.rutgers.edu Date: Thu, 17 Jan 91 12:26:20 -0800 From: brian@ucsd.edu (Brian Kantor) Message-Id: <9101172026.AA00111@ucsd.edu> To: ietf-smtp@dimacs.rutgers.edu Subject: Re: SMTP changes One idea I've been stewing on for a while is to split out the header and the several parts of a message in this way: HELO wombats.edu MAIL FROM: RCPT TO: HEAD multiple lines of header . PART 1 multiple lines of text . PART 2,1033 1033 bytes of text or image or whatever PART 3 more text . QUIT In the header there would be Part: lines that described the processing and presentation of the various parts of the message. The PART commands have a numeric parameter that counts the parts being sent, and optionally a bytecount. If the doesn't appear, the stuff following is assumed to be sent like current SMTP text, terminated by a '.' on a line by itself, whereas if the bytecount is present, exactly that many bytes of data are sucked off the stream as that part of the message. Thus you could send mail that has text, stereo sound, images, spreadsheet data, and printer output (for example) all in one message. Presumably the header lines for each part would specify the intended presentation order and method. You could even send a narrated slide show this way. - Brian From "Mark Crispin " Thu Jan 17 15:41:59 1991 Flags: 000000000001 Received: from akbar.cac.washington.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA10215; Thu, 17 Jan 91 15:38:41 EST Received: from tomobiki-cho.cac.washington.edu by akbar.cac.washington.edu (5.65/UW-NDC Revision: 2.21 ) id AA01465; Thu, 17 Jan 91 12:38:34 -0800 Date: Thu, 17 Jan 1991 12:32:59 -0800 (PST) From: Mark Crispin Subject: Re: SMTP changes stuff To: Stef@ics.uci.edu Cc: Mark Crispin , ietf-smtp@dimacs.rutgers.edu In-Reply-To: <11272.664105095@nma> Message-Id: Hi Stef -- this all sounds good to me. My concern on RFC1154 is strictly as an implementor; I would like to see the spec tightened up a little bit, and support some additional information and types. But I have no particular religious feelings about what form it should take other than it be readily implementable with reasonably-sized code. From "Ned Freed, Postmaster " Thu Jan 17 15:56:27 1991 Flags: 000000000001 Received: from CBROWN.CLAREMONT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA10649; Thu, 17 Jan 91 15:53:32 EST Date: Thu, 17 Jan 1991 12:52 PST From: "Ned Freed, Postmaster" Subject: Re: comments on the SMTP session protocol To: Ariel@relay.prime.com Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: <82AD12A19680063A@HMCVAX.CLAREMONT.EDU> X-Envelope-To: ietf-smtp@dimacs.rutgers.edu X-Vms-To: IN%"Ariel@relay.prime.com" X-Vms-Cc: IN%"ietf-smtp@dimacs.rutgers.edu" Robert Ullmann writes: > I think there are a number of aspects of SMTP that have contributed > to its success, but are poorly understood by most users and > implementors. > ... > This does not necessitate changes to the MTAs (Mail Transfer Agents, > another ISOmorphism :-). This is crucial in my opinion: we aren't > going to get any MTA change rolled out anytime in the 1990's to > anything approaching universality; and the MTA should not care (read > "MUST NOT interfere with") what is being sent anyway. Yes! Any change made in all this needs to be transparent to existing SMTP implementations. They may not know what to do with the new material they are given, but they should be able to pass it on to the next guy that *does* know without any trouble. Adhering to this rule means that line lengths cannot be extended directly, but some encoding can be applied that has the effect of extending the line length, for those sites that understand the encoding. BITNET already does this in some BSMTP encoding by placing a major cookie character in the last available column. I'm not advocating this exact approach, but it does work. If changes made here are not transparent to existing SMTP implementations, I agree that there's little chance of them being widely adopted. Thus, however tempting it is to extend the HELO command, or add new commands to the SMTP dialogue, or add new interpretations of the character following a single dot at the end of the DATA segement, these things should be avoided because they necessitate too many changes to existing implementations. In fact, what we're really talking about extending here should be RFC822, not RFC821. While I don't especially like RFC1154 because of its use of line counting (which doesn't survive line wrapping mailers very well -- I prefer RFC934 conventions), I'd much rather see it used than extend SMTP. > The only useful change I can see making to the MTAs (and to the > definition of the SPDUs) is to define the character code used as > ISO8859/1 aka ASCII-8. This will not take as long to roll out: some > implementations are "compliant" already, while others are trivial to > fix. ("sendmail" can be fixed by the simple deletion of two lines of > code.) After all: why clear the "high" bit if you don't have to? Agreed. This is the right thing to do. It is pretty easy to convert the various national and vendor-specific character sets to ISO8859, and standardizing on ISO8859 as a common-denominator on-wire character set makes a lot of sense. Note that ISO8859 includes 7-bit ASCII as a subset; it would not be appropriate to use it if it did not. > This has already been done by some vendors, and has been proven to > NOT cause interoperability problems. Messages that go through a 7-bit > mailer get the bit cleared, but get delivered. After a while, such > mailers will be viewed as broken. My experience indicates that this is certainly true of PMDF. Removing code to clear the high bit was requested by various European sites and has caused no problems that I know of. I also find that a lot of sites either have vendors that have implemented these changes, or they've hacked them in themselves. > Encodings that do a complete transform of the input object (uuencode, > btoa, hex, FS, LZJU90 (see another future message ;-)) produce a 6 or > 7 bit encoding known to survive various gates and conversions, and > thus will survive such "broken" mailers. RFC1113 offers another encoding, but the better of these encodings all share the basic concept of stuffing 3 8 bit characters of input into 4 6 bit characters of output. The only minor differences are in the output character set and the handling of trailing material (because of the 3x chunking implied by the input processing). Since they are all pretty much equivalent, I don't really care which ones are eventually used. RFC1113 has the advantage of being nicely written up already, but that's about it. > Rob Ullmann Ned Freed From "Ole Bj|rn Hessen " Thu Jan 17 16:21:59 1991 Flags: 000000000001 Received: from ifi.uio.no by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA11274; Thu, 17 Jan 91 16:17:42 EST Received: from baaleyg.ifi.uio.no by ifi.uio.no with SMTP id ; Thu, 17 Jan 1991 22:17:29 +0100 From: Ole Bj|rn Hessen Received: by baaleyg.ifi.uio.no ; Thu, 17 Jan 1991 22:17:26 +0100 Message-Id: <9101172117.AAbaaleyg02909@baaleyg.ifi.uio.no> Subject: Extension to SMTP. To: ietf-smtp@dimacs.rutgers.edu Date: Thu, 17 Jan 1991 22:17:24 +0100 Dan Oscarsson (Dan@DNA.LTH.Se) is working on extensions to IDA sendmail to support ISO character sets. His idea is to extend the SMTP protocol with a new command ISOC [character-set] (ie. ISOC 8859-1). If responding access point do not respond with 250 ok to this command, it is assumed to be a 7 bit old mailer In Norway, we need support for ISO 8859-1 or something stronger. As soon as Dan is finished with his patches, we will try this version at our Universities. I don't know if Dan is listening here. Anyway I'll mail him and give him a hint. Ole Bjorn. [Sorry if you got this letter twice] From "Greg Vaudreuil " Thu Jan 17 16:37:01 1991 Flags: 000000000001 Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA11873; Thu, 17 Jan 91 16:32:40 EST Received: from NRI by NRI.NRI.Reston.VA.US id aa13914; 17 Jan 91 16:30 EST To: "Ned Freed, Postmaster" Cc: Ariel@relay.prime.com, ietf-smtp@dimacs.rutgers.edu Subject: Re: comments on the SMTP session protocol In-Reply-To: Your message of Thu, 17 Jan 91 12:52:00 -0800. <82AD12A19680063A@HMCVAX.CLAREMONT.EDU> Date: Thu, 17 Jan 91 16:30:29 -0500 From: Greg Vaudreuil Message-Id: <9101171630.aa13914@NRI.NRI.Reston.VA.US> Folks Let me state some of the basic assumptions I have about this effort. Hopefull I can start to get a framework from which a solution may emerge. 1) SMTP should be modified to allow for 8 bit character sets. Not having an 8 bit path is a flaw in the origional RCC 821 specification that I do not want to see remain, especially when it is so easy to correct. To make this work, some mechanism must be defined to allow mailers to detect other mailers capable of handling 8 bits. This most simply can be by defining a synonym for HELO, recognized only by mailers that are 8 bit compliant. Other options are playing with the DATA command, but that wastes several exchanges before recognizing that the receiving mailer cannot accept the message as encoded. 2) Given that there are 7 bit systems that need to interoperate with 8 bit systems, I proposed a single standard encoding be adopted. If you are a 8 bit system, and you want to send to a 7 bit, line limited system, then uuencode, or somesuch, but allow the UA to retrieve the origional 8 bit message without trying to figure out n different formats. If a standard format is adopted, I can, without updating my mailer or reader, put in a script to detect non-ascii text and convert it before my mail-reader looks at it. If my mail-reader can't deal with the 8 bit data, I have not lost anything, I just fail to get aditional functionality. 3) Provisions needs to me made to 822 to allow for easy interoperation of multimedia mail. I don't care whether it is RFC1154 based or RFC934 based, or whatever. It would be useful to be able to use currently defined OSI body parts, including G3 FAX, and ISO8859/1. To the extent that this requires binary transmission capabilities, I would like to see the elimination of line length in RFC 821. This concern is secondary to allowing 8 bit transmission... and I'm not advocating unlimited line lengths for 7 bit systems. I understand that line lengths are harder to modify than striping bits. It this is seen as a difficult change, that a good alternative needs to be proposed. I do not like the idea of uuencoding, or hexifying (?) everything when a 8 bit path is available! Greg Vaudreuil From "Ittai Hershman " Thu Jan 17 16:55:55 1991 Flags: 000000000001 Received: from SHEMESH.GBA.NYU.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA12446; Thu, 17 Jan 91 16:52:55 EST Received: by shemesh.gba.nyu.edu (4.1/1.34) id AA14018; Thu, 17 Jan 91 16:49:43 EST Date: Thu, 17 Jan 91 16:49:42 EST From: Ittai Hershman To: "Ned Freed, Postmaster" Cc: ietf-smtp@dimacs.rutgers.edu Office-Phone: 212-285-6080 Subject: Re: comments on the SMTP session protocol In-Reply-To: Your message of Thu, 17 Jan 1991 12:52 PST Message-Id: If changes made here are not transparent to existing SMTP implementations, I agree that there's little chance of them being widely adopted. It seems to me that is not a big problem. One solution off the top of my head is that we could assign a new port number to the new SMTP. If a connection is established to that port, it could use a new set of semantics, and if not it could fall back to port 25 and use the old SMTP semantics (and decide what PART's (using Brian's terminology) to transmit. Since only a new SMTP would try to connect to the new port number the semantics problem goes away... By the way, here are my initial contributions to the inevitable naming contest: MCMTP = More Complicated Mail Transfer Protocol or MMTP = Multimedia Mail Transfer Protocol -Ittai From Rudy.Nedved@rudy.fac.cs.cmu.edu Thu Jan 17 18:00:09 1991 Flags: 000000000001 Received: from RUDY.FAC.CS.CMU.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA14260; Thu, 17 Jan 91 17:57:06 EST Received: from rudy.fac.cs.cmu.edu by RUDY.FAC.CS.CMU.EDU id aa01691; 17 Jan 91 17:56:49 EST To: Ittai Hershman Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: comments on the SMTP session protocol In-Reply-To: Date: Thu, 17 Jan 91 17:56:43 EST Message-Id: <1686.664153003@RUDY.FAC.CS.CMU.EDU> From: Rudy.Nedved@rudy.fac.cs.cmu.edu >It seems to me that is not a big problem. One solution off the top of >my head is that we could assign a new port number to the new SMTP. If >a connection is established to that port, it could use a new set of >semantics, and if not it could fall back to port 25 and use the old >SMTP semantics (and decide what PART's (using Brian's terminology) to >transmit. Since only a new SMTP would try to connect to the new port >number the semantics problem goes away... I agree. Instead of overloading some service like SMTP and then having arguments about what this SMTP service should be doing or not doing, lets have a new service which we can clearly say "hey your XXX service does not conform to spec!" Also the 8-bit versus 7-bit issue is not a big thing, the line issue is the significant one. Alot of systems do line wrapping because of other systems that have small line buffers. I am continually amazed at the occasional pieces of mail that I received that can not be returned because the outgoing line buffers are larger then the incoming line buffers. A few comments on a new specification if destined. 1) SMTP was easy to sell to other places. During the intial deployment, implementations were created for DECNET that worked better then the current systems and it even was reincarnated as BSMTP for BITNET. 2) SMTP tended to keep to the exchange of mail. Improvements for performance were made by smarted SMTP receivers and senders not but increasing the complexity of the protocol. 3) The simple and direct manner of command line and response line, minimal states for each command, clear state interaction between the various commands and plain expectations of what would happen helped immensely. Now with the Internet be much much larger any significant protocol must be extremely easy to sell to the harassed system manager or programmer, must have a poor man's implementations "free" for those people who have PCs and the like and above all be mature enough for people to install it once and forget it. -Rudy From "Ned Freed, Postmaster " Thu Jan 17 22:46:53 1991 Flags: 000000000001 Received: from CBROWN.CLAREMONT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA22589; Thu, 17 Jan 91 22:43:01 EST Date: Thu, 17 Jan 1991 16:33 PST From: "Ned Freed, Postmaster" Subject: Re: comments on the SMTP session protocol To: ietf-smtp@dimacs.rutgers.edu Message-Id: X-Envelope-To: ietf-smtp@dimacs.rutgers.edu X-Vms-To: IN%"ietf-smtp@dimacs.rutgers.edu" Using a different TCP port makes the problem much, much worse! It will virtually guarantee that this extension is never used! For one thing, how do you determine which port to use when sending to a given system? Checking one and then another wastes a great deal of time. You're seeing the Internet as a tightly interconnected set of systems all accessible via TCP. I don't see it that way at all, especially when it comes to e-mail. The use of MX records in the DNS has made it possible for many, many systems to interoperate with the Internet at least as far as e-mail is concerned. If you fail to include these systems in these extensions, you've automatically eliminated much of their utility. By adding extensions at the SMTP level, you're forcing the MX gateways to upgrade their software before the benefit of the extensions can reach the machines the gateway serves. Many gateways simply will not do this in a timely fashion. If you add the requirement of an additional server on a different port, things get even more unlikely. If going to a second port is seriously considered, I'm afraid I have to say that then we might as well use X.400 on it. Ned From "Nathaniel Borenstein " Fri Jan 18 09:38:11 1991 Flags: 000000000001 Received: from thumper.bellcore.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA13370; Fri, 18 Jan 91 09:31:53 EST Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id for ietf-smtp@dimacs.rutgers.edu; Fri, 18 Jan 91 09:31:47 EST Received: by greenbush.bellcore.com (4.12/4.7) id for ietf-smtp@dimacs.rutgers.edu; Fri, 18 Jan 91 09:34:18 est Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Fri, 18 Jan 1991 09:34:15 -0500 (EST) Message-Id: Date: Fri, 18 Jan 1991 09:34:15 -0500 (EST) From: Nathaniel Borenstein To: ietf-smtp@dimacs.rutgers.edu Subject: Re: comments on the SMTP session protocol In-Reply-To: References: Status: O Excerpts from internet.ietf: 17-Jan-91 Re: comments on the SMTP se.. Ned Freed@hmcvax.claremo (1169) > If going to a second port is seriously considered, I'm afraid I have to > say that then we might as well use X.400 on it. Bingo. The standard excuses for not switching to X.400 (I've offered both of these many times) are that it is not yet mature and that it is too big and complex. How much sense does it make to design a new (=immature) protocol to extend (=make more complex) SMTP? The world HAS a standard, and it is X.400. The Internet can still make a pretty good case that it isn't worth switching to X.400, but the heart of that case is that we have SMTP and it works well enough to satisfy us. Making a case for an SMTP replacement, however, fatally undermines the arguments against X.400. My recommendation: minor tweaks to SMTP and the relevant RFC's where absolutely necessary to support multimedia mail or other badly needed functionality. Otherwise, if it ain't broke, don't fix it, and if it is broke, hold our collective noses and replace it with the fast-maturing international standard that was designed to replace it. From "Robert Ullmann " Fri Jan 18 16:10:54 1991 Flags: 000000000001 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA08987; Fri, 18 Jan 91 16:06:06 EST Received: from Relay.Prime.COM by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA16460; Fri, 18 Jan 91 16:05:58 EST Message-Id: <9101182105.AA16460@rutgers.edu> Received: (from user ARIEL) by Relay.Prime.COM; 18 Jan 91 16:09:48 EST To: IETF SMTP list From: Robert Ullmann Subject: notes on mailing binary objects Date: 18 Jan 91 16:09:49 EST Status: O Hi, There is a common misconception being repeated here: that 8 bit text is the same thing as binary, and now you can mail anything. This is simply not correct. Firstly, text still consists of lines, of some limited length, terminated by CRLF; the set of symbols available is not 256 in any sequence. Much more important is the reliability of the data transfer. Prime has several years of experience with mailing binary objects around our network. The following is not theory or speculation, but is based on hard experience. Consider the error rate on the net. Specifically, consider the rate at which /undetected/ errors occur. If we get one undetected error in 1 million bytes transferred, i.e. a BER of 10^-7 (using "BER" loosely), this is acceptable for text messages, but utterly unacceptable for a binary object. 1 million bytes is maybe 500 text messages, our hypothetical 1 error will cause a typographical error in /one/ of those messages; it will likely go un-noticed. Binary messages tend to be larger. The favorite object in binary mail is an executable, with binary messages averaging closer to 1 MB per message. Our hypothetical 1 error in 1 MB is now an error in /every/ message. Suppose the message was an executable: the result is, at best, a core dump when run; more likely a subtle failure that might happen a long time later. Even if you postulate a different actual error rate, it is still true that binary is 5-6 orders of magnitude more sensitive to errors in transmission. The way to move a binary object is for the sender to compute a checksum (CRC), compress the data, and encode it for transmission. The receiver decodes, de-compresses, and verifies the size and checksum. The object transmitted is normally significantly smaller than the original binary object, instead of the typical 4/3 expansion from (e.g.) uuencode. No changes to MTA's are required: the data is the business of the MUA's and the users, not the MTA's. Which is as it should be. It also provides an end-to-end check on the data integrity, which is crucial to the transmission of binary data. As I said, Prime has a lot of operational experience with mailing binaries. The real problem is that it is much too popular; people mail any object at any time, often regardless of cost ... I will be sending out a draft of a good algorithm for this (the LZJU90 I referred to previously) sometime in the next week or so. Best Regards, Robert Ullmann +1 508 620 2800 x1736 Ariel@Relay.Prime.COM From "Terry Crowley " Fri Jan 18 17:22:28 1991 Flags: 000000000001 Received: from DILITHIUM.BBN.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA11124; Fri, 18 Jan 91 17:16:31 EST Received: by dilithium.BBN.COM (AA14154); Fri, 18 Jan 91 17:16:08 EST Message-Id: <9101182216.AA14154@dilithium.BBN.COM> From: "Terry Crowley" Date: Fri, 18 Jan 91 17:16:06 EDT Subject: Re: notes on mailing binary objects To: ariel@relay.prime.com Cc: ietf-smtp@dimacs.rutgers.edu X-Translated: From BBN/Slate multimedia format (version 1.2). Status: O >> The way to move a binary object is for the sender to compute a >> checksum (CRC), compress the data, and encode it for transmission. >> The receiver decodes, de-compresses, and verifies the size and >> checksum. By the way, this is exactly the procedure BBN/Slate uses for transmitting multimedia documents through standard text mail paths (although the compression and checksum calculation are done in the same step). And the size of the transmitted document is almost always smaller than the uncompressed size of the original document. Terry From "kessler@hacketorium.eng.sun.com (Tom Kessler)" Fri Jan 18 18:22:50 1991 Flags: 000000000001 Received: from Sun.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA12870; Fri, 18 Jan 91 18:17:20 EST Received: from Eng.Sun.COM (exodus-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) id AA19419; Fri, 18 Jan 91 15:17:15 PST Received: from hacketorium.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA12178; Fri, 18 Jan 91 15:17:13 PST Received: by hacketorium.Eng.Sun.COM (4.1/SMI-4.1) id AA07176; Fri, 18 Jan 91 15:16:07 PST Date: Fri, 18 Jan 91 15:16:07 PST From: kessler@hacketorium.eng.sun.com (Tom Kessler) Message-Id: <9101182316.AA07176@hacketorium.Eng.Sun.COM> To: Mark Crispin In-Reply-To: Subject: re: SMTP changes stuff Cc: ietf-smtp@dimacs.rutgers.edu Status: O There are couple of good points in Mark C's message. XDAT has some definite advantages here. A seperate BATC command (if we decide we want to do a batching command) would also be acceptable at least as far as I'm concerned. I also like the way XDAT clarifies that we're talking about data (not headers, or whatever else...). The question of what we do when we hit a machine that doesn't do XDAT is one I'd like to see some discussion on. UUencoding is certainly the obvious solution that occurs to us Unix weenie types (the uuencode program is available in a public domain C implementation among other things) although I've never seen a document describing it's format (Actually this would make a good topic for another RFC). How do you folks feel about this? Should gateways between 7bit and binary capable mailers be required to encode the body? Should the form of the encoding be specified (somebody would have to document uuencode)? Or would it be preferable to allow mailers to return/forward and error message when they hit such a gateway? Unfortunately (at least for now) the Unix compress is under a cloud due to the recent action on the patent of the LZW compress algorithm. I'd like to avoid requiring (or even suggesting) that folks use something they might have to pay royalties on. I 100% agree with the structure of data being beyond SMTP's realm. That stuff belongs in an RFC1154 type document (I think we talked about this at the Boulder meeting and agreed on that much at least). --Tom Kessler From "Roger Fajman " Fri Jan 18 19:51:38 1991 Flags: 000000000001 Received: from alw.nih.gov by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA15545; Fri, 18 Jan 91 19:47:09 EST Received: from cu.nih.gov by alw.nih.gov (5.61/alw-2.1) id AA12417; Fri, 18 Jan 91 19:47:05 -0500 Message-Id: <9101190047.AA12417@alw.nih.gov> To: Ariel@relay.prime.com Cc: IETF-SMTP@dimacs.rutgers.edu From: "Roger Fajman" Date: Fri, 18 Jan 91 19:46:18 EST Subject: Re: notes on mailing binary objects Status: O I think that a network that gets one undetected error per million bytes transferred is in severe need of fixing. BITNET has lots of experience mailing binary files around with good results. Many people also send UUencoded binary files via SMTP, also with good results. UUencode has no builtin checksum. This isn't to say that I think that checksums for objects is necessarily a bad idea. I just think your argument is overstated. From "David Herron " Sat Jan 19 01:32:01 1991 Flags: 000000000001 Received: from TWG.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA22642; Sat, 19 Jan 91 01:25:57 EST Received: from Obelix.twg.com by twg.com with SMTP ; Fri, 18 Jan 91 22:14:38 PST Received: from obelix.twg.com by Obelix.TWG.COM id aa03331; 18 Jan 91 22:14 PST To: ietf-smtp@dimacs.rutgers.edu Subject: Re: SMTP changes stuff In-Reply-To: Your message of Fri, 18 Jan 91 15:16:07 -0800. <9101182316.AA07176@hacketorium.Eng.Sun.COM> Date: Fri, 18 Jan 91 22:14:16 -0800 From: David Herron Message-Id: <9101182214.aa03331@Obelix.TWG.COM> Status: O Gents, Hi.. just joined up so I'm wondering what's going on & if there's an archive-of-messages somewhere that'd be groovy. I have a couple of comments regardless.. There was a message earlier today about using a different "port" for the extended SMTP. Hmm.. I don't like that tooo much as I feel it would cause more work on administrators -- at the simplest they would have to _know_ which of their neighbors are able to do the extended SMTP & then splitting their neighbors into two lists, one pointing at the extended-smtp channel, and the other pointing at the normal-smtp channel. The (capital I) Internet being what it is, there's a heck of a lot of neighbors so these lists are going to be large. Maybe we could work in a new thing in the nameservers to keep that indication. But this seems like a slightly odd thing to add to the nameservers. On the other hand extending SMTP doesn't seem to be very hard. Add in some "capability negotion" as part of the startup. How about a new command "CAPB" (for CAPaBilities") sent by the initiator somewhere in the startup -- just after the HELO/response portion probably. The data part in the CAPB might be a list of capabilities or just one at a time. If it were a list then the data in the reply should tell which capabilities the responder is willing to support. However this doesn't fit the SMTP/FTP model where the reply data is soley for human (and not machine) consumption. Therefore it might be necessary for the initiator to ask for capabilities one at a time & the yes/no status of the response indicate whether that capability is acceptable. If the SMTP implementation is "only" RFC-821, then it won't recognize CAPB at all, but all it will do is natter back some negative responses. I don't recall if it is free to drop the connection or not, but I've never seen an SMTP which did that. Ergo -- very probable interoperation with old servers. > XDAT has some definite advantages here. A seperate BATC command (if we > decide we want to do a batching command) would also be acceptable at least as > far as I'm concerned. I also like the way XDAT clarifies that we're > talking about data (not headers, or whatever else...). "batching"?? eh?? Are you meaning BSMTP? But BSMTP doesn't _need_ any extra commands. (except that, for some reason, Alan Crosswell felt the need to add a "ticket number" command (TICK) so that the initiator could keep the original message around until the reply file came back & use the ticket number to connect the two). > UUencoding is certainly the > obvious solution that occurs to us Unix weenie types (the uuencode program is > available in a public domain C implementation among other things) although > I've never seen a document describing it's format (Actually this would > make a good topic for another RFC). > How do you folks feel about this? Mebbe I'm not a real Unix weenie after all. But uuencode isn't the obvious solution to me -- it's a rather wasteful encoding. There happens to be approximately 90-96 usable characters while uuencode only uses about 63-64 of them. Another encoding -- atob/btoa -- is also in the public domain and pretty widely available (tho not as available as uuencode). It also makes full use of the usable characters. It also happens to work through BITNET links, important since some sets of BITNET links step on even the printable characters. On the other hand -- the wide availability of uuencode is important because it carries a lot of intertia. I won't strenuously object ... SunOS (at least) includes a man page in section 5 describing uuencode. > Should gateways between 7bit and binary capable mailers be required to > encode the body? Should the form of the encoding be specified (somebody > would have to document uuencode)? Or would it be preferable to allow > mailers to return/forward and error message when they hit such a gateway? Yes, of course. I just see the shouting out in comp.mail.misc what with all them header purists there who get upset at gateways rewriting headers on internet<->uucp transitions. The form of the encoding can easily be carried in the message .. either in the header a la RFC-1154, or in the body with body parts encoded similarly to a digested message. That is, blocks like --------- (seperator) Header: lines Including: a line count like so: Lines: 54 ... 54 lines of encoded stuff ... I don't quite like the RFC-1154 method because it's not so obvious to uneducated users. I think it might also be harder to construct by hand (supposing you're at a site which doesn't have an 1154 compliant user agent, but you want to send that sort of mail). But what should happen if some particular body part translation cannot be done at a particular gateway. What the PP gents suggest in their manual is a "filter channel" which replaces that body part with an IA5 message saying "There was an xxx body part here, but it was deleted at site yyy". This isn't very friendly ... ;-). Hmm.. this was gonna be a short message. Sigh David From "backman@ftp.com (Larry Backman)" Sat Jan 19 06:33:39 1991 Flags: 000000000001 Received: from ftp.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA02293; Sat, 19 Jan 91 06:30:13 EST Received: by ftp.com id AA17661; Sat, 19 Jan 91 06:30:22 -0500 Date: Sat, 19 Jan 91 06:30:22 -0500 From: backman@ftp.com (Larry Backman) Message-Id: <9101191130.AA17661@ftp.com> To: ariel@relay.prime.com, tcrowley@diamond.bbn.com Subject: Re: notes on mailing binary objects Cc: ietf-smtp@dimacs.rutgers.edu Status: O >> The receiver decodes, de-compresses, and verifies the size and >> checksum. By the way, this is exactly the procedure BBN/Slate uses for transmitting multimedia documents through standard text mail paths (although the compression and checksum calculation are done in the same step). And the size of the transmitted document is almost always smaller than the uncompressed size of the original document. Pardon my changing the subject, but has anyone considered encryption of mail messages as part of the new standard? Larry Backman backman@ftp.com From "David Ascher " Sat Jan 19 14:11:33 1991 Flags: 000000000001 Received: from brownvm.brown.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA09198; Sat, 19 Jan 91 14:05:48 EST Message-Id: <9101191905.AA09198@dimacs.rutgers.edu> Received: from BROWNVM.BROWN.EDU by brownvm.brown.edu (IBM VM SMTP R1.2.1MX) with BSMTP id 7622; Sat, 19 Jan 91 14:06:14 EST Received: by BROWNVM (Mailer R2.07) id 3835; Sat, 19 Jan 91 14:06:13 EST Date: Sat, 19 Jan 91 14:01:12 EST From: David Ascher Subject: Re: notes on mailing binary objects To: IETF-SMTP@dimacs.rutgers.edu Status: O Someone mentioned encryption as part of the SMTP extensions. A couple of thoughts: 1. It wouldn't be human readable, and therefore departs from the original thought of SMTP. (It wouldn't be Simple Anything afterwards). 2. What encryption? DES? (Can't send mail overseas). RSA? (expects that everyone have RSA keys -- won't happen for a long time, also costly). A new encryption scheme? I hope not. I'd say that encryption, along with signatures, timestamping, etc. are very necessary and should come soon, but I don't see them fitting into SMTP. We might as well start a brand new protocol (which If I'm correct is the subject of other RFC's, etc..) --david ascher brown university CIS dascher@brownvm.brown.edu From "Phillip G. Gross " Sat Jan 19 18:05:56 1991 Flags: 000000000001 Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA13212; Sat, 19 Jan 91 17:59:14 EST Date: Sat, 19 Jan 91 17:58:12 EST X-Mailer: Mail User's Shell (6.5 4/17/89) From: "Phillip G. Gross" To: David Ascher , IETF-SMTP@dimacs.rutgers.edu Subject: Re: notes on mailing binary objects Message-Id: <9101191758.aa07977@NRI.NRI.Reston.VA.US> Status: O > Someone mentioned encryption as part of the SMTP extensions. There is an effort called "Privacy Enhanced Mail" (PEM) in progress in the Privacy and Security Research Group (PSRG) of the IRTF. There are several implementations now being tested. It would be interesting to look into how these efforts might be able to interact. However, for the reasons mentioned in the second message, it is probably not practical to include encryption in this effort. Phill From stef@nma.com Sat Jan 19 18:22:03 1991 Flags: 000000000001 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA13501; Sat, 19 Jan 91 18:16:25 EST Received: from nrtc.northrop.com by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA29110; Sat, 19 Jan 91 18:16:18 EST Received: from nma.com by nrtc.nrtc.northrop.com id ab09123; 19 Jan 91 15:15 PST Received: from localhost by nma.com id aa14795; 19 Jan 91 13:46 PST To: "Ned Freed, Postmaster" Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: comments on the SMTP session protocol In-Reply-To: Your message of Thu, 17 Jan 91 16:33:00 -0800. Reply-To: Stef@ics.uci.edu From: Einar Stefferud Date: Sat, 19 Jan 91 13:46:28 -0800 Message-Id: <14793.664321588@nma> Sender: stef@nma.com Status: O Hello Ned -- I have to agree with you. This discussion is taking an interesting path, from "lets make a few extensions", to "lets make some pretty radical extensions" to "lets just get a new port number and define a whole new protocol for multimedia multibodypart mail" Well, lets not do this (but say we did) and use X.400 instead. I am not the slightest bit amused by the idea of trying to create an entire new INTERNET standard to compete with SMTP and X.400, with the potential result of further messing up a big mess...\Stef From stef@nma.com Sat Jan 19 18:23:42 1991 Flags: 000000000001 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA13506; Sat, 19 Jan 91 18:17:58 EST Received: from nrtc.northrop.com by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA29390; Sat, 19 Jan 91 18:17:50 EST Received: from nma.com by nrtc.nrtc.northrop.com id ae09123; 19 Jan 91 15:17 PST Received: from localhost by nma.com id aa14843; 19 Jan 91 13:53 PST To: David Ascher Cc: IETF-SMTP@dimacs.rutgers.edu Subject: Re: notes on mailing binary objects In-Reply-To: Your message of Sat, 19 Jan 91 14:01:12 -0500. <9101191905.AA09198@dimacs.rutgers.edu> Reply-To: Stef@ics.uci.edu From: Einar Stefferud Date: Sat, 19 Jan 91 13:52:58 -0800 Message-Id: <14841.664321978@nma> Sender: stef@nma.com Status: O I think we should quickly note several things. 1. PEM-DEV@tis.com (subscription requests to pem-dev-request@tis.com) is a list related to an internet task force of some kind working on Privacy Enhanced Mail in the RFC822 connect, to be carried in the current RFC821 envelope without any mods to RFC821 or RFC822. PEM is deliberately designed to be independent of its means of carriage. PEM provides for public key encryption of keys and certificates and DES encryption of the body of messages. I don't see any reason for RFC821 to be modified for the intended purposes, so I suggest that ietf-smtp should not take this task in its scope. 2. If you are seeking a "secure-mail-transfer-system" then you will be designing yet another one to compete with the X.400(88) standard which does provide for security measures inside the P1 (or MTA level envelope handling). I strongly suggest that trying to start now to design, implement, and deploy an entirely new start on such a system is a generous waste of time and effort. 3. I note a considerable confusion in this list between extensions to RFC821 and extensions to RFC822. I think this group should come to grips with this issue soon, and resolve what its scope really is. For example, why should RCF821(ext) be concerned with the data types inside the body, as long as it has established that there is some 8bit data inside the body which must be protected and preserved in transfer. Is this task force concerned with RFC822 extensions, or is it limited to RFC821? I suggest that it be limited to RFC821. Any structural changes to be made to the internal structure of the body of a message should be dealt with as an extension of RFC822 (and its follow-on RFCs, of which there are several -- can someone identify all the related RFC822 RFCs, just for the record?). I know of RFC934 which defines an encapsulation format (with delimiter codes and bit-stuffing techniques) which is widely ( but not universally) used for forward messages and digests. RFC1154 provides a partially competing (with RFC934) scheme for placing multiple body parts inside a single "main" body part. I recall that there are some other minor extension RFCs, but I forget their numbers. Best...\Stef From "Ittai Hershman " Sat Jan 19 19:05:02 1991 Flags: 000000000001 Received: from SHEMESH.GBA.NYU.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA14131; Sat, 19 Jan 91 18:56:58 EST Received: by shemesh.gba.nyu.edu (4.1/1.34) id AA15675; Sat, 19 Jan 91 18:53:58 EST Date: Sat, 19 Jan 91 18:53:58 EST From: Ittai Hershman To: Stef@ics.uci.edu Cc: ietf-smtp@dimacs.rutgers.edu Office-Phone: 212-285-6080 Subject: Re: comments on the SMTP session protocol In-Reply-To: Your message of Sat, 19 Jan 91 13:46:28 -0800 Message-Id: Status: O Sigh. This is the second misrepresentation of my comment on selecting another port number. I was not advocating a third protocol (as Stef suggests). What I did say, was that there were ways by which changes could be made to SMTP to support MMM, and yet retain backward compatibility with existing SMTP implementations. This discussion is orthogonal to the X.400 issue. Two mechanisms have been mentioned to accomplish this: the first was to have the extended SMTP try a new port number first -- if successful, it would send the MMM message, if not, it would connect to the standard SMTP port and would deliver the message (perhaps in a manner similar to an AMS message which is delivered to a POS (plain old SMTP) implementation). The other suggestion to have a new transaction after the HELO to determine what extensions the server implementation will support, if the assigned keyword is not understood, the sender backs off to POS. Both ideas have problems, and no doubt there are better solutions, but they both demonstrate the key point that we can extend SMTP to have MMM capabilities without losing backward compatibility with those sites which are slow to make changes. We at NYU have been involved in the White Pages project since its exception, and I for one have always been interested in X.400/X.500. On the other hand, it would be nice if we could begin experimenting with MMM without having to deal with major network protocol suite changes and all the ramifications therein -- after all, moving to X.400 (and X.500) is far more complicated that extending SMTP. Why all this concern: if X.400 is so good, we'll end up using it anyway. There is no harm in exploring and implementing other options. You know full well that this has been how this community has done things for years. -Ittai From "Ittai Hershman " Sat Jan 19 19:10:54 1991 Flags: 000000000001 Received: from SHEMESH.GBA.NYU.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA14349; Sat, 19 Jan 91 19:02:31 EST Received: by shemesh.gba.nyu.edu (4.1/1.34) id AA15685; Sat, 19 Jan 91 18:59:32 EST Date: Sat, 19 Jan 91 18:59:32 EST From: Ittai Hershman To: Stef@ics.uci.edu Cc: ietf-smtp@dimacs.rutgers.edu Office-Phone: 212-285-6080 Subject: Re: comments on the SMTP session protocol In-Reply-To: Your message of Sat, 19 Jan 91 13:46:28 -0800 Message-Id: Status: O In the third paragraph of my note, the word "exception" should read "inception". I should also note that given the politics, I was rather surprised this IETF subrgroup was formed at all -- I assumed we would all move to X.400 someday. Now that it here however, I think we should proceed and explore all our options. -Ittai From "Phillip G. Gross " Sun Jan 20 10:17:16 1991 Flags: 000000000001 Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA26436; Sun, 20 Jan 91 10:09:31 EST Date: Sun, 20 Jan 91 10:08:29 EST X-Mailer: Mail User's Shell (6.5 4/17/89) From: "Phillip G. Gross" To: Stef@ics.uci.edu, David Ascher Subject: Re: notes on mailing binary objects Cc: IETF-SMTP@dimacs.rutgers.edu Message-Id: <9101201008.aa14640@NRI.NRI.Reston.VA.US> > I think this group should come > to grips with this issue soon, and resolve what its scope really is... > > Is this task force concerned with RFC822 extensions, or is it > limited to RFC821? I suggest that it be limited to RFC821.... Stef, The charter for this group is available online in the IETF directories at NIC.DDN.MIL and NIC.NORDU.NET. The scope is at least two-fold. 1) We wish to make extensions to SMTP (RFC821) to allow 8 bit character sets and probably to eliminate line lengths (to facilitate passage of binary data). 2) We also wish to examine RFC 1154 for passing "body parts" inside SMTP. Finally, we may wish to look at the availablity of packages that implement the new features and encourage their deployment. This started as an effort to allow the passage of international character sets in SMTP mail, but once the 7 bit restriction is removed, it is very tempting to look at the broader issue of passing structured binary date. I agree with your basic point that we need to settle on the scope of the effort fairly quickly, and then stay focused until we have completed that initial effort. It is clear that there has been much work in this area already, and anything this WG does should carefully attempt to harmonize with these other ongoing efforts, rather than strike out in a totally different direction. I also agree that encryption and privacy should not be within the scope of this effort. This group will meet at the next IETF in St. Louis (March 11-15). Phill From stef@nma.com Sun Jan 20 10:42:48 1991 Flags: 000000000001 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA26673; Sun, 20 Jan 91 10:36:02 EST Received: from [128.99.0.1] by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA11635; Sun, 20 Jan 91 10:35:55 EST Received: from nma.com by nrtc.nrtc.northrop.com id ac11540; 20 Jan 91 7:35 PST Received: from localhost by nma.com id aa15451; 19 Jan 91 23:29 PST To: Ittai Hershman Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: comments on the SMTP session protocol In-Reply-To: Your message of Sat, 19 Jan 91 18:53:58 -0500. Reply-To: Stef@ics.uci.edu From: Einar Stefferud Date: Sat, 19 Jan 91 23:29:20 -0800 Message-Id: <15449.664356560@nma> Sender: stef@nma.com My possible misinterpretation is only a matter of degree. What I am trying to convey is that it is very easy to start out with the idea of making a few experimental extensions, and wind up doing a whole new protocol. I liked the ideas of only extending SMTP(RFC821) to handle 8-bit data, and especially to handle the ISO8859-? character set. Handling other well defined 8-bit objects should also be covered by the same extensions if I understand what we are talking about. I would like to leave the issue of how these objects are identified inside the body of extended RFC822 message formats to an extension of RFC822, and not get RFC821(ext) involved with the content of the envelopes, beyond the issue of providing an 8-bit path. I am not happy with the RFC1154 technique of using line counts for object delineation. I would not like to use any kind of count, including character counts. (SHADES of HERMES and TOPS-20 MAIL where my files used to get out of whack every now and then, and I could not touch a message without breaking the whole folder of mail.) Especially when we operate in an environment where these counts can be changed by various transport servers. I would much prefer a methood like RFC934 which is independent of intermediate mungers, which are rampant around the internet "mail" community. Internet Mail is not limited to the IP connected SMTP servers, as we all should remember. I also note that there is a lot of talk here about how well this new design is going to be accepted and deployed across the entire network, so I don't think we are just talking about an isolated experiment. WE seem at times to be talking about a global development and deployment plan for a new mail system design. So, I suggest that we settle down to the task of agreeing on the real scope of this project. Cheers...\Stef From "John C Klensin " Sun Jan 20 13:33:39 1991 Flags: 000000000001 Received: from INFOODS.MIT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28626; Sun, 20 Jan 91 13:22:47 EST Date: Sun 20 Jan 91 13:22:25-EST From: John C Klensin Subject: Re: notes on mailing binary objects To: pgross@nri.reston.va.us Cc: Stef@ics.uci.edu, DASCHER@brownvm.brown.edu, IETF-SMTP@dimacs.rutgers.edu Message-Id: <664395745.800937.KLENSIN@INFOODS.MIT.EDU> In-Reply-To: <9101201008.aa14640@NRI.NRI.Reston.VA.US> Mail-System-Version: >This started as an effort to allow the passage of international character >sets in SMTP mail, but once the 7 bit restriction is removed, it is very >tempting to look at the broader issue of passing structured binary date. Phil, From my perspective as someone who "gets to" spend a lot of time dealing with the consequences of using and gatewaying "internet mail" into and outside the Internet, I'm feeling a need to reinforce some of what I perceive Stef as saying, and to broaden it a bit. To a certain extent, I'm trying to provide a peek into a can of worms, either to discourage opening it or to make sure people (yourself and Greg especially) understand what is being opened before all the worms are let out. I think everyone should recall, as we go through this exercise, that the number of hosts that are not on the Internet that use "internet mail" (i.e., RFC822 formats and sometimes batch variations on RFC821 envelopes) is probably at least equal to the number of internet hosts that do so. If "users", rather than "hosts" are counted, the number is probably larger. Those systems are not isolated from the Internet: there are all sorts of applications-level gateways to and from them; many of those gateways, as already pointed out, are "hidden" by mechanisms we have explicitly and intentionally provided, notably the concept of MX DNS records. IMHO, we have an obligation to pay attention to the translation, header-munging, and similar needs of those hosts and networks. At a minimum, it will have an influence on the effectiveness with which a set of extensions will be broadly implemented and diffused. I think an important criterion for "success" should be that, as we try to make things better (e.g., to handle "eight-bit characters") we should avoid making them worse in terms of the ability to make invertible translations at the gateways. It is worth keeping in mind that some of those gateways perform character-set translations now, creating the concept of RFC822 headers that are not encoded in ASCII (7 bit or otherwise). As long as they know what is coming, these translations are easy and, with a few small idiosyncracies, reversable and unambiguous. Right now, they know (or are permitted to believe) that they are getting ASCII. The seven bit variety. As specified in a specific and identifiable document. Period. In part because of those other networks and users, the basis for forming a consensus about this should be broader than might be appropriate for, e.g., TCP or IP clarifications. In particular, it is not clear to me that "This group will meet at the next IETF in St. Louis" is an appropriate answer to any of a range of questions because, if IETF decides to make some explicit invitations (which I would encourage) the broader community might participate usefully by email, but is unlikely to be appropriately represented in St. Louis. Let me suggest a few things from experience in watching experiments in this area (some of them unintentional): o It is only a small step between supporting mail that consists [only] of seven bit ASCII characters and supporting mail that consists of ISO8859-1 (Latin-1) characters. While I have always been convinced that this requires envelope changes, rather than "hope the other guy doesn't discard any bits" approaches, the extensions are pretty straightforward one can easily think about translation gateways handling it without great difficulty, and there are lots of devices and systems on the market that can easily handle (including compose and display) ISO Latin-1 today. o It is a significantly larger step to make the jump from ASCII to ISO8859-n (for values of "n" potentially larger than 1). Use of nearly-arbitrary character sets will tend, in practice, to put pressure on systems and gateways to translate code positions between one that arrives from the network and one that is well-supported locally. This is not easy, since many of these character sets "nearly" translate and that leads to requirements to "stylize" graphics that don't quite appear locally. It also opens up the header/envelope problem considerably, because one can now move from something in the envelope with the semantics of "expect [eight-bit] ISO8859-1 instead of ASCII" to some combination of header and envelope that has the semantics of "expect ISO8859-n character sets, and I may change character sets mid-message". The latter might imply a requirement for some sort of encapsulation, with character-set-specification in the inner envelopes, but there are other ways to handle it that are more satisfactory for some purposes. This issue has been discussed at tedious lengths in other contexts; the lengthy discussions about extending the kermit protocol to deal with semi-arbitrary character sets that went on a year or so ago provide the best illustration I know of. At the same time, an extension to ISO8859-n still provides some nice simplifications relative to SMTP extensions. In particular, since all of those character sets preserve a close approximation to ASCII in columns 2 through 7 and prohibit graphic characters in columns 0 and 1 and in 8 and 9, it remains easy to recognize end-of-message (CRLF.CRLF) when it arrives, since it is the same sequence of bits in ASCII-with- leading-zero and, in practice, in all of the ISO8859 character sets. This simplification also makes it no more likely than it is today that mailing a well-formed message from one host will result in, e.g., terminal hangups on other hosts. As soon as, e.g., graphics are permitted in columns 8 and 9 (which some existing character sets permit) a new layer of robustness protection, probably in UAs, is required. o Another huge simplification occurs if one takes the position that message headers remain in ASCII, the variant character sets are acceptable only in message bodies. This position is usually not acceptable, under a principle that is sometimes enunciated as "I want to be able to spell my name correctly". However, it would be possible to go through RFC822 without too much effort and identify which fields and subfields must be confined to ASCII (e.g., anyone want to have ISO8859-7 names in the DNS?) and which ones really are free text. o The step to "mailing binary files" is another major one. It is major even relative to "mailing arbitrary one-octet character sets", since there are no obvious models --in the current RFC821 context-- for understanding where the message body ends. With arbitrary characters (but still characters), it is easy to think, e.g., of extending DATA >From DATA to DATA , where is some surrogate for "." in the extended character set. You have to do something like this, because "." might not, in principle, appear in all character sets. One could redefine "." as a particular sequence of bits (lots of CCITT stuff uses table positions, which amounts to the same thing). But, in binary files that are not organized into lines, while one can still clearly have character-doubling excape sequences they become a not-very- reliable pain in the neck. Personally, as someone else suggested, that set of arguments, as well as the already-posted argument for a higher standard of robustness, make me quite partial to bit or octet counts and checksums (or ECC) in preference to terminating characters. And, for better or worse, "internet mail" does get moved through and over networks that don't provide reliable transport. Identifying those networks as "broken" won't change the world very much, very quickly: it is nearly equivalent to expressing a preference that everyone simply connect to the Internet rather than using other arrangements. But, regardless of what one concludes about counts vs delimiters, the problems get more complex. o At a slightly different level, there are a large set of issues, on which "there has been much work done" that have been blocked, or where horrible kludges have been resorted to, because the Internet community has said for ten years "you don't get to mess with RFC821, you can make certain extensions to RFC822, but it is likely that no one else will interpret them the way you want". I'm not suggesting this has been an explicit policy, only that is has been the effect of a combination of policies and benign neglect. In many respects, that has been useful: having a standard that both fairly comprehensible and that has been completely stable for over a decade has probably contributed as much to its wide acceptance and implementation as it "Simple"-ness. But, if one of the targets of IETF and the working group is broad adoption and implementation --ideally, rapid replacement of SMTP by "extended SMTP"-- then there are, IMHO, extremely strong pragmatic arguments both for remaining as close to "simple" as possible and for an "open it once, then close it again for another ten years except for clearly upward-compatible extensions" philosophy. That may argue that an approach of defining "this effort" as narrowly as possible, getting on with it, and then coming back and doing other things may not be optimal. It may, instead, suggest that we should at least examine all of the possible pieces of this and figure out just where we are going, and how far in those directions we are willing to go and still consider it "SMTP" (as distinct from NSSMTP (Not-So-Simple...) or MCMPT (Moderately- Complex...), presumably as different services on different ports. o That leads to another argument, which I'd encourage people to think about before we jump off into "mailing binaries", which I've taken to mean transporting un-encoded arbitrary files as mail. I have no doubt that we can figure out a way to do that, there are lots of smart people participating in this. I do question whether, however obvious it might be as a thing to do, it is *desirable*. Don't think about it as "mail" for a moment, think about it as a transport mechanism for sender- initiated, password-free (and other potential kinds of sender validation mostly free), file transfer. The network environments in which mail-like mechanisms are most commonly used for transporting binaries (BITNET/EARN/ and the rest of the NJE family) treat them sufficiently differently from mail at the meta-envelope level that MUAs just don't see them. Many of the systems in those networks also handle such files, on receipt, in a way that is exempt from user disk quotas until read or flushed. To encourage the transfer of (typically larger, as was pointed out) binary files via mail means that we need to examine the robustness of potential receiving systems that impose disk quotas on users against the range of problems that often manifest themselves as "can't send a message to that user to tell him that he is out of space because he is out of space". [... I don't consider the NFS and DECNet models for writing files into someone else's file system to be mail-like. And they can provide a relatively high level of validation. ...] Speaking personally, if a file arrives at my system that I can't filter down to acceptable graphics (columns 1-7 and 9-15, less 7/15 and 15/15) in a UA and then type out and read, and I'm expected to execute that file or load it into a database system solely based on trusting the sender, I'd like to have a somewhat higher quality of sender validation available to me than we typically get out of SMTP systems. Granted, I'm a bit more paranoid than most, but I'd like to see someone from Dave Clark's committee comment on this issue before we change protocols to make the practice seem "easy" and "encouraged". At a minimum, I think this argues quite strongly that, if both "eight-bit characters" and "binary file transfers" are part of the program of work, they should be sufficiently distinct in the envelope that an SMTP server can accept the first and immediately reject the second and close the connection if local system integrity considerations argue for the latter. This should be able to be a system-wide decision, not something deferred to UA processing of headers. I'd even argue for language that an SMTP server "should" be configurable to disable acceptance of binary files if that feature were otherwise supported. We may well need a cleaner "push" mechanism for binary objects in the Internet than we have now, but it might be better to look at extensions or variations to FTP than using mail for this purpose. At least we should discuss that question, not blindly go off and extend mail. o Finally, let me suggest about a possibly-different way of thinking about the X.400 issue wrt SMPT extensions. Let's take it for granted that after some period of time, X.400 will replace Internet mail as we now know it, or at least "should do so" and "will be ready to do so". Some of us apparently think "some period of time" is a few months at most, others of us apparently think "a few lifetimes" would be a better estimate. Whatever one thinks of the timeframe, it is pretty clear to me that an extension to SMTP that (i) is needed and (ii) is quite likely to have a significant life expectancy between when it is widely available and whenever X.400 "takes over" is worth doing. Something that is as feature-laden and complex as X.400(88) will presumably take longer to implement and widely disseminate than X.400(88) because those folks presumably have a head start, at least on reading the documentation analogous to what we haven't started writing yet. That, to me, is a pretty strong argument for keeping the changes to SMTP pretty minimal if possible: the smaller they are, the more they will be implemented and the more rapidly those implementations will propagate, improving the payback period. Conversely, it seems to be that it is also a strong argument for those who think that X.400 is right around the corner to define a WKS for X.400 mail seeing how rapidly X.400 implementations can propagate on the Intenet, and then arguing against particular SMTP changes on the grounds that the facilities already exist, are implemented and demonstrated in the Internet-profile X.400 environment, and the particular feature is not worth tampering with SMTP in that environment. That becomes a sensible engineering tradeoff argument, not a religious discussion about how, since utopia (or, if you prefer, the end of the world) is right around the corner, one should do little or nothing in the interim. john ------- From "Phillip G. Gross " Sun Jan 20 14:43:50 1991 Flags: 000000000001 Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA29725; Sun, 20 Jan 91 14:38:36 EST Date: Sun, 20 Jan 91 14:37:36 EST X-Mailer: Mail User's Shell (6.5 4/17/89) From: "Phillip G. Gross" To: John C Klensin , pgross@nri.reston.va.us Subject: Re: notes on mailing binary objects Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: <9101201437.aa16736@NRI.NRI.Reston.VA.US> John, Thanks for laying out a wide set of issues in a very comprehensive way. There were many thoughts therein that will form the basis of much continuing discussion. I was very interested in your comments about previous "blocking" and/or "benign neglect". I'm not sure what you are referring to (and maybe don't need to get into details), but let me assure you that there will be no "blocking" at this time of any technical work that the community approves of. (If you were suggesting that the community *itself* blocked previous efforts by not being able to come to closure, then that is a different matter we might have to face.) Again, let me remind everyone that this effort started as an attempt to carry interenational cahracter sets (probably ISO 8859-1 only) in SMTP mail. *Every* other proposal (eg, ISO 8859-n, binary mail, structured binary mail, etc) has been suggested after the original meeting. I agree that we have to carfully consider our program of work, and not bow to the temptation of creeping featurism. At the same time, I think you are quite correct in stating that once we close the hood on this effort, we don't want to re-open again too soon. Therefore, if there *are* other issues we could consider while the hood is up, well, then I think we should at least consider them. This has been a hot mail discussion, and the contributions of the participants in the list will obviously have a strong impact on the final outcome. However, as in all such efforts, at some point some decisions have to get made -- and these decisions are unlikely to have 100% agreement in the group, but hopefully will represent a consensus. Then some documents need to get written. I would very much like to see the general target date of the IETF meeting in March used as an implicit deadline for reaching some consensus. Thanks, Phill From "Jan Engvald LDC " Sun Jan 20 20:28:16 1991 Flags: 000000000001 Received: from Pollux.lu.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA06030; Sun, 20 Jan 91 20:21:55 EST Received: from Castor.ldc.lu.se by pollux.lu.se with SMTP (5.61-bind 1.4+ida/IDA-1.2.8) id AA28186; Mon, 21 Jan 91 02:21:47 +0100 Received: from gemini.ldc.lu.se by castor.ldc.lu.se; Mon, 21 Jan 91 01:43 +0100 Date: Mon, 21 Jan 91 01:42 +0100 From: Jan Engvald LDC Subject: Outlines for new functionality To: ietf-smtp@dimacs.rutgers.edu Message-Id: X-Vms-To: XJELDC,IN::"ietf-smtp@dimacs.rutgers.edu" Status: O >This is the way RFC1113-1115 handles encryption. The functionality is >a subset of X.506 and X.411 (?) and is consciously designed to inter- >operate with these ISO protocols. It is also designed not to require >any changes to the UA, although it is desired for ease of use. >I see no reason not to use these RFCs for encryption in the SMTP world >and our proposal should not conflict with it. Ooops, I forgot. There is one thing that I would like to see changed in RFC1113-15: The common character set that they convert to should be the same common set that we will use for nonencrypted mail. As a side effect bit 8 cannot be used to denote non-text (=padding), so some other algorithm should be used for this. A common aproach to all the extended functionality is desired and feels natural. Jan E LDC From "Jan Engvald LDC " Sun Jan 20 20:28:18 1991 Flags: 000000000001 Received: from Pollux.lu.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA06032; Sun, 20 Jan 91 20:21:58 EST Received: from Castor.ldc.lu.se by pollux.lu.se with SMTP (5.61-bind 1.4+ida/IDA-1.2.8) id AA28189; Mon, 21 Jan 91 02:21:55 +0100 Received: from gemini.ldc.lu.se by castor.ldc.lu.se; Mon, 21 Jan 91 01:24 +0100 Date: Mon, 21 Jan 91 01:23 +0100 From: Jan Engvald LDC Subject: Outlines for new functionality To: ietf-smtp@dimacs.rutgers.edu Message-Id: X-Vms-To: IN::"ietf-smtp@dimacs.rutgers.edu" X-Vms-Cc: XJELDC Status: O I think more attention should be payed on what functionality we want, then we can discuss means of implementation. Some wishes on mail from our users that affects this discussion: - Want correct characters displayed even if the source mail was written in another character set. - Want to be able to include arbitrary files (formatted documents, illustrations, programs,...) in a mail without hassle. - Want to be able to encrypt mail. - Still want to be able to exchange mail with anybody in the world. Some additional whishes from system managers and program writers: - Big files should be split into several mail items. - It should not require AI for a user agent program to automatically do necessary conversion and packing/unpacking for the user. - Conversion to/from X.400 should not be unnecessary hard for the gateway. I think we should try to extend mail functionality without REQUIRING changes to the mail transport network. This way the new functionality can be spread to the community much quicker. However, I also think we should make some change to the transport. The aim of this is not functionality, but efficiency and ease of debugging. It is much easier for a human to read the characters themselves compared to reading quoted representations of them. And of course conversion increases the load on a mail gateway. The functionality extensions does not require any changes to RFC821/822. What is needed are new headers that informs the UA what conversion it should do and what the syntax and contents of the mail body is. This is the way RFC1113-1115 handles encryption. The functionality is a subset of X.506 and X.411 (?) and is consciously designed to inter- operate with these ISO protocols. It is also designed not to require any changes to the UA, although it is desired for ease of use. I see no reason not to use these RFCs for encryption in the SMTP world and our proposal should not conflict with it. For including files I also think it is important that any user can unpack such files even if he does not have a new files-aware UA. This is best accomplished by using tools that already exist for this purpose. They usually also handle a CRC, so corrupt files can be discarded instead of causing damage. I would liked the standard to pick one of these, but I see no hope of common agreement, so the headers must tell which ones have been used. A fancy UA can look at these and then call the tools to unpack the coded files. Thus all of atob, uuencode, xxencode, uncompress, arc, lhz, zoo and zip have to be available. The headers should also tell the names and types of files that are included in the mail. It is important that the definition takes into consideration that an UA should be able to automatically extract a file even if it is split into several mail units. For the character set issue I think we should decide on a common set that is an union of all the sets we want to support. All mail exchange should use this set between partners that are extended. If one of them is not an extended implementation, a bidirectionally unique and human readable quoting to 7-bit ASCII should be performed. This way extended end systems can communicate through old non-extended systems. The conversion between the common character set and a local one can be done common to all users on a machine, but preferably be specific to each user depending on his display possibilites. The source character set should be noted in a header as an aid for the best conversion. There are some implementations along these lines that are in test within NORDunet, Dan Oscarsson (Dan@dna.lth.se) and some colleagues in the different countries have done a good job in this matter. They use one new command in the SMTP dialog: ISOC , if the recepient does not understand that, the mail is converted to plain old 7-bit ASCII. Summary: We should aim for much better functionality by defining new headers and defining the structure of the mail body when these headers are used. New user agents should use this information to do things automatically for the user, users with old UA should be able to participate doing things manually. A few extenstions to the transport for efficiency and ease of debugging should be done, but no changes to existing transport should be REQUIRED to support the new functionality. Jan Engvald, Lund University Computing Center ________________________________________________________________________ Address: Box 783 E-mail: Jan.Engvald@ldc.lu.se S-220 07 LUND Earn/Bitnet: xjeldc@seldc52 SWEDEN (Span/Hepnet: Sweden::Gemini::xjeldc) Office: Soelvegatan 18 VAXPSI: psi%2403732202020::xjeldc Telephone: +46 46 107458 (X.400: C=se; A=TeDe; P=Sunet; O=lu; Telefax: +46 46 138225 OU=ldc; S=Engvald; G=Jan) Telex: 33533 LUNIVER S From "Mark Crispin " Sun Jan 20 22:32:49 1991 Flags: 000000000001 Received: from akbar.cac.washington.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA08324; Sun, 20 Jan 91 22:28:08 EST Received: from tomobiki-cho.cac.washington.edu by akbar.cac.washington.edu (5.65/UW-NDC Revision: 2.21 ) id AA06323; Sun, 20 Jan 91 19:28:05 -0800 Date: Sun, 20 Jan 1991 19:04:48 -0800 (PST) From: Mark Crispin Sender: Mark Crispin Subject: Re: comments on the SMTP session protocol To: ietf-smtp@dimacs.rutgers.edu In-Reply-To: <15449.664356560@nma> Message-Id: Status: O In <15449.664356560@nma>, Einar Stefferud writes: >I would like to leave the issue of how these objects are identified >inside the body of extended RFC822 message formats to an extension of >RFC822, and not get RFC821(ext) involved with the content of the >envelopes, beyond the issue of providing an 8-bit path. > >I am not happy with the RFC1154 technique of using line counts for >object delineation. I would not like to use any kind of count, >including character counts. > >I would much prefer a method like RFC934 which is independent of >intermediate mungers, which are rampant around the internet "mail" >community. Internet Mail is not limited to the IP connected SMTP >servers, as we all should remember. I believe that these three points that Stef has made are very important and I am mostly in agreement with them. [Line Counts] As an RFC-1154 implementor, I had problems with its use of line counts specifically because I had problems in deciding "what is a line?" This was easy on TOPS-20 which used the same newline conventions as SMTP. Unfortunately, UNIX has a different newline convention and by the time my application gets to the data CR's have been stripped from the stream. Two very different text streams in the Internet newline convention are identical once converted to UNIX conventions. I believe that it is important, however, for some information about the structure of the message to be available via a parse of the message header, as is presently available from RFC-1154. In fact, I would like more information available. I am not as opposed to line or character counts as Stef is, but I believe that reliance upon them as the sole mechanism for finding a region is ill-advised. [7bit vs 8bit] Since I still have some connection to the TOPS-20 e-mail world -- a few of these beasts are still wandering around the swamp -- I also would like to lobby for a means which does not break mailers which are likely to use a 7bit representation of text forever, yet offer these mailers the possibility of being a full player in the ISO8859-? game through the means of some representation of 8bit characters in 7bit form. Has anyone performed a density analysis on "typical" 8bit text to determine how frequently the characters above 0x7F are used? If, as I suspect, they are used relatively infrequently compared to characters at or below 0x7F, it may be acceptable to snarf some less frequently used character -- one of \|`~ come to mind -- as a "apply 0x80 bit to next character" escape when the following two cases are both true: 1) the message uses ISO8859-? and this fact is indicated by some cookie in the header 2) the receiving mail agent declines to receive 8bit text. This places a burden upon an 8bit receiver to detect the header cookie and do the conversion from 7 to 8bit, removing the cookie and escapes in the process. -- Mark -- From "Steve Kille " Mon Jan 21 04:20:48 1991 Flags: 000000000001 Received: from bells.cs.ucl.ac.uk by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA14627; Mon, 21 Jan 91 04:10:31 EST Received: from glenlivet.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with SMTP inbound id <2505-0@bells.cs.ucl.ac.uk>; Mon, 21 Jan 1991 09:09:53 +0000 Mto: Robert Ullmann Cc: IETF SMTP list Subject: Re: notes on mailing binary objects Phone: +44-71-380-7294 In-Reply-To: Your message of 18 Jan 91 16:09:49 -0500. <9101182105.AA16460@rutgers.edu> Date: Mon, 21 Jan 91 09:09:52 +0000 Message-Id: <497.664448992@UK.AC.UCL.CS> From: Steve Kille There seem to be to issues wrt mailing pictures, files, etc. 1) The structure of the document 2) The encoding If we are not careful, they will get very tangled. I propose that they are decoupled. A neat way of doing this, is to design the structure using ASN.1 notation, which also has the neat effect of making it easy to ensure X.400 compatibility. The encoding can be define in a manner appropriate for SMTP and MMM objects. This could be done to counter problems of the baroqueness of the ASN.1 BER (comment by Crispin) and the problems of length calculations in RFC 1154 (comment by Freed). Here is a suggested ASN.1. Message ::= SEQUENCE { Heading, SEQUENCE OF BodyPart } Heading ::= SEQUENCE OF IA5String -- one 822 heaader per sequence element -- e.g., To: Robert Ullmann BodyPart ::= CHOICE { [0] Message, [1] ExternallyDefinedBodyPart } ExternallyDefinedBodyPart ::= SEQUENCE { parameters [0] EXTERNAL OPTIONAL, data EXTERNAL } This should be clear(ish) even to those not familiar with ASN.1, execpt perhaps for EXTERNAL. This allows binding of an Object Identifier to Data. 822 headers are used, so that this spec can be used without 987 or 1148. The definition of ExternallyDefinedBody part is taken from X.400. Paramters are bound separately. This simplifies the case where there is simple data, and a number of parameters (e.g., Fax). It also allows private parameterisation of a standard data format. It would probably be useful to take the EXTENDED-BODY-PART-TYPE macro from X.400 to facilitate definitions of new body parts. I think that lots of groups on the internet would want to define body parts, and so a registry could be started. For example: voice-body-part EXTENDED-BODY-PART-TYPE PARAMETERS VoiceParameters IDENTIFIED BY id-ep-voice DATA VoiceData ::= id-et-voice I'll restrict myself to the structure of the message. Others can argue about the encoding. Steve From stef@nma.com Mon Jan 21 05:32:36 1991 Flags: 000000000001 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA15651; Mon, 21 Jan 91 05:26:34 EST Received: from [128.99.0.1] by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA09342; Mon, 21 Jan 91 05:26:24 EST Received: from nma.com by nrtc.nrtc.northrop.com id aa14106; 21 Jan 91 1:55 PST Received: from localhost by nma.com id aa16720; 21 Jan 91 0:45 PST To: "Phillip G. Gross" Cc: IETF-SMTP@dimacs.rutgers.edu Subject: Re: notes on mailing binary objects In-Reply-To: Your message of Sun, 20 Jan 91 10:08:29 -0500. <9101201008.aa14640@NRI.NRI.Reston.VA.US> Reply-To: Stef@ics.uci.edu From: Einar Stefferud Date: Mon, 21 Jan 91 00:45:15 -0800 Message-Id: <16717.664447515@nma> Sender: stef@nma.com Thanks Phill for providing the key parts of the text from the charter of this working group. For those of us who are not IP connected (many of us live behind MX records with dialup mail delivery), perhaps it would be a good idea to just mail out the full text of the referenced FTPable files. It really is a pain to dial into some host and go FTP groping for what is likely to be a rather smallish file in the end. (BTW, I will not be able to attend the IETF meeting in St Louis in March.) (I am already booked into 4 meetings in 2.5 solid weeks of travel in March.) Next (and I am glad to see Mark's comments about RFC1154 in a subsequent message) I would like to identify all the RFCs that are either relevant or subject to modification or obsolescence from this work. I would include RFC934 as relevant, at least. We know about RFC 821, 822, 1154. Also, I find that much of what is proposed is well outside the scope of RFC821(smtp) in that it really concerns adding RFC822 headers to signal facts about the content of the BODY, which is an RFC822 defined entity (Although RFC821 does reference it, I don't believe RFC821 defines it.) As I survey the territory, I see the main RFC821 issues as only extending to handle 8-bit data. Sub-elements of this issue deal with how 8-BITish we are going to handle -- FULL BINARY or just 8-BIT ASCII? This is because of the line length problem and the issue of keeping the from being confused with some binary bit-strings . To keep things simple, I strongly suggest limiting all RFC822 headers (which includes all RFC821 insertions among the RFC822 headers) to 7-bit ASCII. I hope we are not going to open the worm can that will make it mandatory for everyone in the world to be able type accents and umlauts into internet (RFC822) addresses! As I see it, this might very well kill SMTP mail services. This is building into a big X.400 problem already. Lets not get on board that particular leaky boat. The X.400 world has to resolve this, but I hope the INTERNET community does not! I can see lots of logic to support Mark Crispin's desire to an RFC822 header to identify facts about the body. But, I hope these are not line or character counts for offsets into the body, since I also see various schemes floating around for the use of multiple character escape coding for mapping 8-bit to 7-bit and vice versa. I assume these schemes are totally incompatible with any kind of offset element counting pointers. Anyway, I see much more discussion here about RFC822 than about RFC821. And, just one more thought about X.400 relationships. I would hope that it might become possible for an X.400/RFC822&821 gateway to just stuff an ASN.1 Body Part into an RFC822 body, with whatever appropriate RFC822 header indicators, and let it fly. But, if this is too much like DULL BINARY, then I would expect that we would have to define some kind of binary-to-8-bit-ascii encoding to do it. I also want too remind everyone that simple binary transfers are not a good "EXTERNAL" format for transferring structured information, in case some of us are not aware of this fact, or do not understand this issue. Binary transfers only work between identical (hardware and software and application) systems, or systems that are able to bit pick in a bit/byte reordered binary delivery to find what they are looking for in spite of the bit/byte scrambling. (e.g, IBM MVS programs to bit pick LOTUS-123 MS-DOS binary images that have been binary kermit transferred into MVS systems in order to incorporate subsidiary data into the corporate databases! And vice versa! Yuck!) This is why ASN.1 was invented and is used, as an EXTERNAL transfer format that is independent of all INTERNAL storage conventions. If you don't want to use ASN.1 for this purpose, then you have to invent another syntax of the same kind for the same purpose. Just what value another new one might add is not clear to me. Cheers...\Stef From "John C Klensin " Mon Jan 21 06:09:56 1991 Flags: 000000000001 Received: from INFOODS.MIT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA16052; Mon, 21 Jan 91 06:02:28 EST Date: Mon 21 Jan 91 06:02:22-EST From: John C Klensin Subject: Re: comments on the SMTP session protocol To: MRC@cac.washington.edu Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: <664455742.169937.KLENSIN@INFOODS.MIT.EDU> In-Reply-To: Mail-System-Version: Mark Crispin asks: >Has anyone performed a density analysis on "typical" 8bit text to >determine how frequently the characters above 0x7F are used? I haven't, but might be able to shed some light on the answer by a more conceptual analysis. - If the "data" are really "binary", e.g., pictures or executable, there is no reason to assume, a priori, anything but a random distribution of bit patterns and, consequently, about the same number of "characters" in cols 0 - 7 (0x00 through 0x7F) and those in cols 8-15 (0x80 through 0xFF). This is obvious, but bears repeating, given the number of people who want to see a near-equivalence between "eight bit text" and "arbitrary binary data". There is considerable value in considering them separately. - Think of the family of ISO8859 character sets as falling into two categories, those in which all of the text/alphabetical characters are essentially based on the Roman alphabet plus diacriticals or other markings, and those in which the "upper" columns really represent a different alphabet (e.g., Greek, Cyrillic, Arabic, Kana, Hebrew,... in no particular order). In the latter group, communication *using* the associated language is going to occur *mostly* in the upper character positions, nearly to the exclusion (except for symbols and special characters) of the lower positions. In the former group --think of it as the Latin-m subset of 8859-n-- the upper group will be used proportionately less, at least in ratio to the lower group. However, for many of the applicable languages, the population of the upper positions includes extended (relative to ISO646 BV) vowels as well as extended consonants and symbols. The phonetic ratio of vowel-symbols to consonant-symbols in most languages is such that it does not take a lot of use of those vowels to create a lot of accesses to selected "high" characters. To put this differently, ISO8859 was not optimized--either intentionally or accidentally--for minimal use of the upper characters, it was optimized for upward compatibility with ISO646 and then its particular (Latin-n, at least) variants are optimized around collating sequence variations in the upper panel. This also suggests that, if one were to decide to confine oneself to Latin-1 (ISO8859-1), one could probably arrange some optimizations and escapes that disappear if one intends to accept ISO8859-n for arbitrary standardized values of n. john ------- From Rudy.Nedved@rudy.fac.cs.cmu.edu Mon Jan 21 15:59:42 1991 Flags: 000000000001 Received: from RUDY.FAC.CS.CMU.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA27936; Mon, 21 Jan 91 15:52:40 EST Received: from rudy.fac.cs.cmu.edu by RUDY.FAC.CS.CMU.EDU id aa04346; 21 Jan 91 15:51:09 EST To: David Herron Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: SMTP changes stuff In-Reply-To: <9101182214.aa03331@Obelix.TWG.COM> Date: Mon, 21 Jan 91 15:50:56 EST Message-Id: <4343.664491056@RUDY.FAC.CS.CMU.EDU> From: Rudy.Nedved@rudy.fac.cs.cmu.edu David, I don't see any evidence of why it would be easier for a system administrator to deal with a new program using the existing SMTP port? My experience with SMTP when it first came out and with the crazy environment of 900+ computers I work in involving various groups indicates that adding a service which does not disrupt any existing services is much much easier to get done. The points of persuasion is that if it flys, you win and if it doesn't then you can try again a bit later. There is less pressure for all the parties involved. Admittedly high pressure situations where you live or die if something flys has advantages but these tactics have long since loss value as the scale of computers has increased and the live or die pressue is multipled by several factors of machines. Overall the best tactic I have seen is one where the users and outside individuals push for a new service. The system administrator finally gets fed up with just sitting on the issue and then does it since there is not any significant arguments against it other some installation work. When it is a retrofit they can argue don't have time to reinstall local changes, enough time and energy to install and monitor the installation and other issues relating to disturbing an existing system. The argument that this is the future is pretty weak since the world is geared for believing that some day they will have to change to OSI but no one said anything about this step. -Rudy From dpz@action.rutgers.edu Mon Jan 21 21:29:37 1991 Received: from action.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA07336; Mon, 21 Jan 91 21:29:37 EST Received: by action.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA14809; Mon, 21 Jan 91 21:29:07 EST Date: Mon, 21 Jan 91 21:29:06 EST From: David Paul Zimmerman To: David Herron Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: SMTP changes stuff In-Reply-To: Your message of Fri, 18 Jan 91 22:14:16 -0800 Message-Id: Dave, Good idea! I hadn't thought to set up archiving for the IETF SMTP list's discussions. Luckily, I haven't yet deleted any of the list's messages from my personal mailbox, so I've copied them into a file and made it publicly available. Any further messages to the list (including this one) will be automagically appended. The IETF-SMTP list's archive file is available via anonymous FTP from dimacs.rutgers.edu. The filename is ~ftp/pub/ietf-smtp-archive. David, IETF-SMTP list maintenance daemon From kankkune@cs.helsinki.fi Tue Jan 22 14:32:40 1991 Received: from hydra.Helsinki.FI by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00625; Tue, 22 Jan 91 14:32:40 EST Received: from paros.Helsinki.FI by hydra.Helsinki.FI (4.1/SMI-4.1/32) id AA22439; Tue, 22 Jan 91 21:32:52 +0200 Date: Tue, 22 Jan 91 21:32:52 +0200 From: kankkune@cs.helsinki.fi (Risto Kankkunen) Message-Id: <9101221932.AA22439@hydra.Helsinki.FI> X-Mailer: Mail User's Shell (7.2.0 10/31/90) To: ietf-smtp@dimacs.rutgers.edu Subject: A viewpoint There has been much discussion about how supporting multiple data formats (esp. multimedia) affects mail protocols. I think that the format of the data should not be tied with mail protocols of any level. The body of the mail should be handled just as a stream of bytes, whether it contains ASCII, ISO Latin-1 or some multimedia format. After all, mail is just one form of file transfer. You can get a multimedia document by mail, by ftp from some host, or by direct copy from some directory. In all the cases you have the problem of what program to use to view the file. The solution is to have a program which looks the header of the file (like the unix file program) and starts a suitable viewer based on that. To handle multimedia mail you just would have to set PAGER or some other environment variable to that program. What I'd like to see done is extending 821 to allow arbitrary binary data in the message. This would allow later enhancements in the other mail protocols (822 etc.) without modifying 821 again. An encoding method (btw. ABE encoding has the advantages of preserving readable characters the same in the encoded form and it uses a character set which gets through the various BITNET gateways) should be used to transfer the data to 7-bit installations. Btw. what are the characteristics that can be expected from the network transport layer over which SMTP is run? I suppose we cannot assume a reliable stream of octets to accomodate all the non-Internet sites... It would help to design the extensions if the transport layer capabilities over which SMTP must work were explicitely written down. -- Risto Kankkunen kankkune@cs.Helsinki.FI (Internet) Department of Computer Science kankkunen@finuh (Bitnet) University of Helsinki, Finland ..!mcsun!uhecs!kankkune (UUCP) From nsb@thumper.bellcore.com Tue Jan 22 16:07:45 1991 Received: from thumper.bellcore.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA03736; Tue, 22 Jan 91 16:07:45 EST Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id for ietf-smtp@dimacs.rutgers.edu; Tue, 22 Jan 91 16:07:39 EST Received: by greenbush.bellcore.com (4.12/4.7) id for ietf-smtp@dimacs.rutgers.edu; Tue, 22 Jan 91 16:10:13 est Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Tue, 22 Jan 1991 16:10:10 -0500 (EST) Message-Id: <4bb_cm60M2YtQ1NNcQ@thumper.bellcore.com> Date: Tue, 22 Jan 1991 16:10:10 -0500 (EST) From: Nathaniel Borenstein To: ietf-smtp@dimacs.rutgers.edu Subject: Re: A viewpoint In-Reply-To: <9101221932.AA22439@hydra.Helsinki.FI> References: <9101221932.AA22439@hydra.Helsinki.FI> Excerpts from internet.ietf-smtp: 22-Jan-91 A viewpoint Risto Kankkunen@cs.helsi (1834) > The solution is to have a program which looks the header of the file > (like the unix file program) and starts a suitable viewer based on that. > To handle multimedia mail you just would have to set PAGER or some other > environment variable to that program. This is more or less the approach I described in an earlier message, but it isn't quite as simple as you make it sound. I have built a prototype program called "metamail" and modified about fifteen mail reading interfaces to use this new program to display various mail formats. (The formats themselves and the programs called by metamail to display them are configured by one or more "mailcap" files.) The two places where it is clearly more complex than a PAGER variable are: 1. You need to use different viewers for different formats. That's precisely what I use the mailcap files to specify. 2. You need to encapsulate some information about the viewing environment. For instance, some viewers might require that they be attached to a real terminal (e.g. to use curses) while some graphical mail readers may be calling metamail (& through it the viewing program) without actually being attached to a terminal window. Paging through "more" or a similar program creates similar problems -- sometimes you want it and sometimes you don't. In my prototype, there are a few options in the mailcap file format and the metamail command syntax that allow the right thing to happen in every case I've managed to think of. (To know what to do, you need a cross product of information about the viewer, which comes from the mailcap file,and information about the mail reading program, which comes from the metamail command line options.) From thchen@hpindlm.cup.hp.com Thu Jan 24 19:27:52 1991 Received: from relay.hp.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA19924; Thu, 24 Jan 91 19:27:52 EST Received: from hpindlm.cup.hp.com by relay.hp.com with SMTP (16.5/15.5+IOS 3.13) id AA02838; Thu, 24 Jan 91 16:27:49 -0800 Received: by hpindlm.cup.hp.com (15.11/15.5+IOS 3.20+cup+OMrelay) id AA00360; Thu, 24 Jan 91 16:32:39 pst From: Theresa Chen Message-Id: <9101250032.AA00360@hpindlm.cup.hp.com> Subject: subscription to the mailing list To: ietf-smtp@dimacs.rutgers.edu Date: Thu, 24 Jan 91 16:32:38 PST Mailer: Elm [revision: 64.9] Hello, My name is Theresa Chen. I am the new owner of the sendmail source in HP. Please add me to your mailing list. Thanks. Theresa thchen@hpindlm.cup.hp.com From Dan.Oscarsson@dna.lth.se Sat Jan 26 06:40:36 1991 Received: from lth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA06978; Sat, 26 Jan 91 06:40:36 EST Received: from Valhall.dna.lth.se by lth.se (5.61-bind 1.5+ida/LTH-4-NS); Sat, 26 Jan 91 12:40:27 +0100 (MET) Received: by DNA.LTH.Se (5.65+bind 1.7+ida 1.4.2/DNA 4-Dan); Sat, 26 Jan 91 12:40:24 +0100 (MET) Date: Sat, 26 Jan 91 12:40:24 +0100 From: Dan Oscarsson Message-Id: <9101261140.AA06035@DNA.LTH.Se> To: ietf-smtp@dimacs.rutgers.edu Subject: Some of my thoughts and implementation on international mail Having looked through all the old messages in this list it is about time I made a comment. I have worked on an international sendmail for over a year. The source for my sendmail have been available since beginning of 1990. The international support in my current version is: The internal character set of sendmail is ISO 8859-1. It uses an extension to SMTP to query the receiver if it can take ISO 8859-1. This extension if fully compatible with other SMTP sites and I and several other sites have been using this for some time. For letters not sent though SMTP you define if that transfer can handle ISO characters or not. When a letter is sent to an 7-bit site, me sendmail converts the letter into 7-bit ASCII trying to make the letter as readable as possible. That is, it does not retain all information. I thought it to be more important that the letter was readable than to allow the letter to be sent through 7-bit sites to destination that understod 8 bits. The SMTP exition is the query: ISOC ISOC = ISO Character set. This extension should be changed slightly. -- Over to the things we have to decide. I can see three important areas: 1) 8-bit mail going into 7-bit mail. Here are two possibilities: encode 8 bits into 7 bits so that it can be decoded back into 8 bits without information loss. Or convert 8 bits into 7 bits with information loss. The negative side on encoding is that an encoded letter will not be as readable as a converted on if the destination site cannot understand it. On the other hand getting letters with encoded characters may make the users force that site to more quickly change to the new 8 bit standard. -- Some other things about 8 bits. 8 bits does not mean binary transfer. If we allow binary transfer we must remove the line fixation that we have today. Also if we encode 8 bit letters into 7 bits, the encoding must both encode 8 bits into 7 bits and must break the encoded text into lines separated with CRLF in a way that can be decoded. 2) Standard character set to use. My original idea that is used in my sendmail described above defines that a letter can use one 8-bit character set at one time. This unfortunately does not allow mixing of for example greek and latin at the same time. X.400 fans may note that X.400 only allows T.61 in many places and T.61 is a very limited character set. It would be best if we used a single character set for transfer of letters between sites that contains all characters in the world. The would make a special header or negotiation about character set unneccessary. It would also allow international characters to be used in headers and simplify convertion between character sets. There are today two character set standards under development that contains most of the used characters in the world: ISO 10646 and Unicode. ISO 10646 is an iso character set with a length of 32 bits per character. It allows dynamic change in the length of a character so that a letter using the letters in ASCII or ISO 8859-1 can be sent using 8 bits per character with no overhead. Inside a letter the character length can be changed so the letters can be sent compactely. Both ASCII and ISO 8859-1 are true subsets so letters from a site following the old standard ASCII or those using ISO 8859-1 can be received by a site following the new standard without any diffculty. Unicode is a character set developed by several companies and always uses 16 bits per character. If also has ASCII and ISO 8859-1 in the first 256 codes, but as they are 16 bits an ASCII letter must be expanded into 16 bits before being used as Unicode. If we use Unicode I see problems receiving letters from sites that follow the old standard. Using Unicode also doubles the size of every letter using ASCII or ISO 8859-1. Unless somebody can give a good reason to use Unicode I propose we define ISO 10646 as the only character set to be used for transmission of electronic mail. It allows us to be easily compatible with "old" sites, use every character in the world in one letter, compact transmission and storage and allows the headers to use other characters that ASCII. 3) Inclusion of other things than text in a letter (multimedia) As a user I want to be able to in the middle of my text include an image, sound, file or something else that is not pure text. RFC 1154 does this be adding new headers and talking about lines. I think we should leave this header mess and line counting and instead add a code to be used in text meaning "included object" which is followed be the object itself and is terminated by a code meaning "end included object". As we are going to use 8 bits I suggest we use the control code APC (Application program command) as start code and by the same standard the termination code for that control code. Withing this object there must be a header followed by the contents of the object itself. We could allow binary transfer here, but is it worth to have? If we want to work together with 7 bit sites it is better that the data is encoded and split into lines from the beginning. I suggest btoa as default and uuencode otherwise. There should be a few simple standard formats defined from the beginning for things like image, audio etc. By simple I mean we should not take an CCITT fax standard for image transfer if there is a more commen simple format in use. We could easily inlcude X.400 body parts with this mechanism inside a letter. It is also easy for a UA to detect an included object and to show it as an icon or something in its place in a letter it it cannot show it in a viewable form. -------------------------- As soon as we can decide on a standard I will change my sendmail to include it so the new standard can be taken into use quickly. Some other thoughts: The standard we define should be used for transmission of articles in netnews and NNTP. The internet standard we define could be defined as an X.400 OID so our letters could easily be sent into X.400. -------------------------- Hope this gives some new thoughts to you. Dan -- Dan Oscarsson Department of Computer Science Lund Institute of Technology e-mail: Dan.Oscarsson@dna.lth.se Box 118 S-221 00 Lund, Sweden From KLENSIN@infoods.mit.edu Sun Jan 27 18:17:33 1991 Received: from INFOODS.MIT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA08472; Sun, 27 Jan 91 18:17:33 EST Date: Sun 27 Jan 91 18:16:27-EST From: John C Klensin Subject: Re: Some of my thoughts and implementation on international mail To: Dan.Oscarsson@dna.lth.se Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: <665018187.443937.KLENSIN@INFOODS.MIT.EDU> In-Reply-To: <9101261140.AA06035@DNA.LTH.Se> Mail-System-Version: Two small observations on Dan Oscarsson's (IMHO, very helpful) comments. (1) While using a "universal" character set has considerable appeal, going to a two-octet (e.g., UNICODE) or possibly-variable-length (e.g., ISO DIS 10646), having these things in a normally-eight (or normally- seven) bit environment causes further complications in a number of areas. Nothing insurmountable, but problems. And I continue to contend that the more different problems someone has to deal with in order to implement, the longer it will take to establish a critical mass of implementations. So the tradeoffs should be considered. (2) Neither of these are finished "Standards". While they are both at late stages, each is, at present, in the status of "working draft for review". For Unicode, the current working draft probably represents the last review version, and, while major changes would be quite surprising, some final tuning would not be. ISO DIS 10646 (it should not be referred to as "ISO 10646" until it is an International Standard) is in voting that, if unanimous, would essentially constitute final adoption. But it is known that there will be some negative votes, and the ISO process does not handle these quickly and rarely without additional changes to the text. I think it would be a poor idea--in the "looking for trouble" category--for IETF to recommend mail transport that used, nearly exclusively, reliance on a working draft and/or not-yet-finished Standard(s). I think this is likely to imply that, if Unicode were chosen, we should not anticipate being able to make Internet standards for at least six to nine months; if completion of 10646 were required, much longer (I think an optimist would suggest 12 to 18 months, pessimists might guess at several years). john Klensin@INFOODS.MIT.EDU ------- From kankkune@cs.helsinki.fi Sun Jan 27 19:13:16 1991 Received: from hydra.Helsinki.FI by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA09199; Sun, 27 Jan 91 19:13:16 EST Received: by hydra.Helsinki.FI (4.1/SMI-4.1/32) id AA06181; Mon, 28 Jan 91 02:13:28 +0200 Date: Mon, 28 Jan 91 02:13:28 +0200 From: kankkune@cs.helsinki.fi (Risto Kankkunen) Message-Id: <9101280013.AA06181@hydra.Helsinki.FI> X-Mailer: Mail User's Shell (7.2.0 10/31/90) To: ietf-smtp@dimacs.rutgers.edu Subject: Will extended-SMTP require 8-bit data path? If we are going extend SMTP to transfer 8-bit text or binary, can we assume that the transport service provides an 8-bit byte transmission channel? Is SMTP used in networks which support only 7 bits (or less) wide data path? If 8-bits wide channel cannot be assumed, the 8-bit text/binary would have to be en/decoded by SMTP. I hope this isn't needed... Risto -- Risto Kankkunen kankkune@cs.Helsinki.FI (Internet) Department of Computer Science kankkunen@finuh (Bitnet) University of Helsinki, Finland ..!mcsun!uhecs!kankkune (UUCP) From gvaudre@nri.reston.va.us Mon Jan 28 15:39:43 1991 Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA02961; Mon, 28 Jan 91 15:39:43 EST Received: from NRI by NRI.NRI.Reston.VA.US id aa05753; 28 Jan 91 15:37 EST Org: Corp. for National Research Initiatives Phone: (703) 620-8990 ; Fax: (703) 620-0913 To: ietf-smtp@dimacs.rutgers.edu Cc: gvaudre@nri.reston.va.us Subject: Mail Working Group Directions. Date: Mon, 28 Jan 91 15:37:35 -0500 From: Greg Vaudreuil Message-Id: <9101281537.aa05753@NRI.NRI.Reston.VA.US> Dear folks, I would like to take advantage of this pause in the fast and furious list discussions to outline a proposed course of action. This group was formed to solve a limited problem in the mail system, but it turns out that there is #significant# if not #overwhelming# need for a good bit of new work. The problem was simply to get international character sets across SMTP. This seemingly simple problem opened up a can of worms. If one character set, than why not many. If many character sets require header identification, why not define a general body part mechanism. If a generic mechanism is to be developed, let's do multi-media mail. And so the issue has exploded. To get a handle on this project, and make some headway on the original problem, I proposed the following course of action. 1) Solve the immediate problems of our international colleagues, while leaving the door open to further work. Simple, well, not really. a) We need to define or choose a standard encoding of international character sets. Proposals have been made for ISO8859, Unicode, and ISO DIS 10646. b) There is a lot of pressure to allow the transmission of 8 bits via smtp for many of these character sets. This can be done either by mapping 8 bits to 7 bits, or allowing 8 bit transmission. Given the wide deployment of 7 bit mailers, either a 7 bit encoding standard needs to be defined or a conversion method need defining. Proposals so far include atob and uuencode. Assuming we choose an ascii compatible character set (ISO DIS 10646 or ISO 8859) and allow 8 bit transmission, both of these points deal with SMTP. SMTP defines the use of US ASCII, and 7 bit transmission. A small step forward conceptually is to additionally modify the SMTP specification to make it possible to easily add other body parts by allowing the sending of binary message bodies. 2) The second phase of the mail extensions will concentrate on Updating RFC 822 to choose a standard mechanism for encoding multi-media mail. This looks to be a much larger project that I originally envisioned and I do not want to hold up the SMTP extensions. Certain aspects of multi-media and binary transmission need to be considered in the short term to make an educated decision on the necessity of binary transmission capabilities in the SMTP protocol. 3) In the longer term, work on a standard for list-service and information-retrieval for internet mail sites is needed. As the maintainer of the IETF directories, I'm painfully aware of the difficulty in servicing requests of many varied formats. X.400 <-> 822 gateways need to be explored, and a 822 message format for use with the gateways need standardizing. With the large interest and wide participation, this mail extensions effort may be split into several working groups. I welcome comments and suggestions to get this project out of the free-form discussion and into the working stage. Greg V. From paf@nada.kth.se Mon Jan 28 16:41:24 1991 Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA05500; Mon, 28 Jan 91 16:41:24 EST Received: from cyklop.nada.kth.se by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA16615; Mon, 28 Jan 91 22:41:03 +0100 Message-Id: <9101282141.AA16615@nada.kth.se> To: Greg Vaudreuil Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: Mail Working Group Directions. In-Reply-To: Your message of Mon, 28 Jan 91 15:37:35 -0500. <9101281537.aa05753@NRI.NRI.Reston.VA.US> Date: Mon, 28 Jan 91 22:41:02 +0100 From: Patrik F{ltstr|m Greg Vaudreuil : At the NETF meeting in Gothenburg a couple of months ago we stared a working group to handle som of all the problems Greg was talking about that popped up in the massive discussion that have been held on this list the last couple of weeks. It was also a discussion on the last ietf meeting in december and we will have a proposal ready for the ietf meeting in mars (was it mars?). To clearify what we are working with, and have been working with, for some months, I have some comments to his proposal. You might also read the letter from Dan Oscarsson which also give a good perspective to the work pople (=Dan) have done on the NORDUNET. >To get a handle on this project, and make some headway on the >original problem, I proposed the following course of action. >1) Solve the immediate problems of our international colleagues, while >leaving the door open to further work. Simple, well, not really. > a) We need to define or choose a standard encoding of > international >character sets. Proposals have been made for > ISO8859, Unicode, and ISO DIS 10646. This is not an easy thing to do. Our proposal is to only handle ISO DIS 10646 for now, because ISO 8859-1 is a subgroup of it and ISO 8859-1 is already implemented by some vendors. We will though have a door open to other character sets, because we differ between the problem of sending 8-bit octets on 7-bit mailers and the problems of describing _which_ >7 bit set we are using. > b) There is a lot of pressure to allow the transmission of 8 bits > via smtp for many of these character sets. This can be done > either by mapping 8 bits to 7 bits, or allowing 8 bit > transmission. Given the wide deployment of 7 bit mailers, > either a 7 bit encoding standard needs to be defined or a > conversion method need defining. Proposals so far include > atob and uuencode. We have been thinking in both _converting_ characters to suitable ones in the sending mashine, and _encoding_ the octets in 7-bit ASCII. If a sender wants to convert it's outgoing mail to 7-bit ASCII noone else is concerned, but if it encodes the characters, then we *must* have a standard in how to do that. We will have a proposal in how to encode those difficult octets in a couple of weeks. The rest of the mail from Greg is about how to run other sorts of multimedia och binaries through mailers in a standardized way. That is sure a complete different problem. Escpecially when you have that many different mashines that SMTP runs on. Sameone else have to grab that sort of discussion (a different working group?). We, the working group in NETF, are only concerned in the >7 bit mailing problems. Probably because of the local problems we have now when vendors starts to use 8859-1 with our swedish charcters in the >127 area. My end-users already use >127 in news and mail... ;-) We're in a hurry! >Greg V. . . . . P a t r i k F a l t s t r o m NADA, KTH Stockholm, Sweden From KLENSIN@infoods.mit.edu Mon Jan 28 17:04:17 1991 Received: from INFOODS.MIT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA06158; Mon, 28 Jan 91 17:04:17 EST Date: Mon 28 Jan 91 17:04:05-EST From: John C Klensin Subject: Re: Mail Working Group Directions. To: gvaudre@nri.reston.va.us Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: <665100245.831937.KLENSIN@INFOODS.MIT.EDU> In-Reply-To: <9101281537.aa05753@NRI.NRI.Reston.VA.US> Mail-System-Version: Let me propose a further narrowing of Greg's list, to try to get comments on what real/immediate/identifiable problems it does *not* solve. I think, unlike Greg's formulation of (1), below and in his message, this *is* simple. > The problem was simply to get international >character sets across SMTP. Is this true? At least at one time, the problem was "simply to get Latin-1 (ISO 8859-1) across SMTP". That is easier, lots easier. It may even be the one worm that can be let out of the can without letting out all of the others. Note also that the various gateways to which we pass mail for other networks are all (or mostly) already engaged in figuring out how to handle Latin-1, or how to translate Latin-1 to the native character sets of their networks. As soon as one moves beyond Latin-1, their activities become much more difficult. There are several reasons why this is important. Perhaps the most important has to do with working out the pattern of error messages (which we have to do in any event). Given a host that has local users, and local extended character set capability but that has a gateway to "some other" network and appropriate character translation operations for it. Consider > HELO I'm-a-sender-domain < 250 OK > HERE comes-the-8bit-ISO8859-7 (unlikely verb example chosen) < 250 OK, I can handle that locally > RCPT-TO: < 250 Recipient OK > RCPT-TO: (an MX ?) < 5xx Recipient ok, but not in *that* character set Now what are you going to do? Send ASCII to both? Send a reset and start over? Send the message in 8859-1 to 'fred', then send the reset and start over on 'george' (not a good plan if george is in the middle of a long list)? Alter VRFY so that it is sensitive to availability of addresses and the character sets in which those addresses can accept mail and require its support? I think this needs to be dealt with in any event, but sticking with one very common character set is likely to minimize the frequency with which the problem actually occurs in practice. >1) Solve the immediate problems of our international colleagues, while >leaving the door open to further work. Simple, well, not really. > > a) We need to define or choose a standard encoding of > international character sets. Proposals have been made for > ISO8859, Unicode, and ISO DIS 10646. As soon as one says "international character sets" (as above) and "standard encoding" (as above), one is driven either to a standard encoding that will encompass "all international characters", or to models for using several alternate international character sets in different messages. At this instant, as distinct from some possible time in the future, the all-encompassing standard encoding is a non-starter. ISO DIS 10646 and Unicode are moving targets (albeit not moving very quickly), some technical problems remain with both, there are a lot of characters in Unicode not in the 10646 proposal, and there is a cultural/technical/political controversy called "Han unification" that separates the two that, probably, IETF should not try to resolve. ISO8859 does not solve the problem because it is a family of character set standards, not one, and, if one tries to set up mechanisms to switch dynamically within the family, a lot of "interesting" problems develop quickly. Now, if one defines this as "handle ISO8859-1 within SMTP", a reasonable selection of Latin-alphabet-based languages are accommodated and the problem is fairly easily solved. I think it can be done in a way that can be extended, without violence to user agents or servers, at some point in the future if we are reasonably smart about it; several suggestions have already been made that would be consistent with that goal and that are not very different from each other. To support "handle ISO8859-1 within SMTP" one needs: (i) An envelope verb or header field that implies "here comes ISO8859-1". I prefer the former, but that is a separate discussion. (ii) An envelope verb that implies "8 bit data", if the above does not do so. What we don't need for this purpose is to solve the line length problem, worry about lengths instead of CRLF.CRLF, worry about how that string is encoded, deal with embeddings, multimedia, variable-length characters, character encodings that cannot be displayed by any known device other than workstation hardware that can parse and draw them, etc. Wrt this last, I have a terminal on my desk that is mass-produced by a major vendor, displays ISO8859-1, sells for circa $500, and has clones that sell for about $300. This is not the case with any of the other options we have discussed. > b) There is a lot of pressure to allow the transmission of 8 bits > via smtp for many of these character sets. This can be done > either by mapping 8 bits to 7 bits, or allowing 8 bit > transmission. Given the wide deployment of 7 bit mailers, > either a 7 bit encoding standard needs to be defined or a > conversion method need defining. Proposals so far include > atob and uuencode. I think that there is an assumption here that needs examination. To the best of my limited knowledge, the only machines out there that have mailer implementations that have a reason (other than excessive care or perversity) to zero or discard the 8th bit are on DEC-10 or -20 hardware, with its five seven-bit characters per word encoding. Almost everything else is perfectly capable of handling an 8-bit transmission, or could be easily modified to do so. Many other 36 bit machines went the "four 9-bit characters" route, which can presumably be modified to handle 8-bit characters without great pain. Given the declining number of 36-bit computers being put into service each year ( :-( ), I'd be hard-pressed to make a cost-benefit argument for spending a lot of time and trouble on seven-bit transport for the Latin-1 alphabet. Perhaps the argument can be made, but I'd like to hear someone make it before we increase the complexity of the protocol that much. Of course, "binary data" and "multimedia", etc., change this equation a lot, but that is precisely why I'm proposing narrowing Greg's proposed narrowing of focus. > A >small step forward conceptually is to additionally modify the SMTP >specification to make it possible to easily add other body parts by >allowing the sending of binary message bodies. I don't believe this is true. I think that "sending binary message bodies" opens up such complexity that it may be the right point to carefully examine the "this is the point to stop and switch over to X.400" hypothesis. I don't know how we get an answer to whether this is a "small step forward" or a "major can of worms", and, if we want something out rapidly that we can agree on, I think it would be very desirable to avoid having to answer that question. Again, I am concerned that the future extension issues, especially "binary", "multimedia", and "lists" may interfere with deployment of an eight-bit Latin-1 extension or vice versa. People may be inclined to wait until we stop changing SMTP before they do anything, or may make one change and decide they have enough. I'd hope to be wrong on this one, but don't know how to answer the question. If I'm right, it makes a strong argument for spending the time working out everything, at least in outline, before announcing/mandating any one change. Summary suggestion, Greg's phase 1: - support for Latin-1 (defined as ISO8859-1) only. - verb in SMTP that specifies ISO8859-1 over an 8 bit transmission path, or one that specifies an 8 bit tranmission path *for characters*, with "ISO8859-1" specified in a new, standardized, header field. - no support for non-character "binary", or embedded or structured mail other than the header/message distinction now made in RFC822. - no support at this stage for 8bit to 7bit translation or encoding. If the receiving SMTP cannot accept 8bit transmission, the message goes in ASCII or not at all. I.e., if the receiving SMTP rejects the "8 bit" verb referred to above, the sender must either send ASCII or give up and close the connection. - no use of extended (i.e., 8 bit) characters in the envelope or headers, only in the message body. I think it might be desirable and practical to relax this restriction somewhat for the headers, but it means consideration on a field-by-field basis, which might not be worth the trouble. Now, I think that is pretty straightforward and is The Right Thing if my hypothesis that it will solve the overwhelming fraction of the real and present problem is correct. john Klensin@INFOODS.MIT.EDU ------- From ARIEL@relay.prime.com Mon Jan 28 22:21:16 1991 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA15734; Mon, 28 Jan 91 22:21:16 EST Received: from Relay.Prime.COM by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA01783; Mon, 28 Jan 91 22:21:03 EST Message-Id: <9101290321.AA01783@rutgers.edu> Received: (from user ARIEL) by Relay.Prime.COM; 28 Jan 91 22:23:31 EST To: IETF SMTP list From: Robert Ullmann Subject: on negotiating ISO8859-1 in SMTP Date: 28 Jan 91 22:23:31 EST Hi, re 8 bit ISO8859-1 mail ... It seems what we want to do is to pry open the can of worms just far enough to wedge in one more, and force the lid down again ... ... which won't work. Fortunately, it isn't needed. If clearing the 8th bit is simply regarded as a _bug_, to be _fixed_, it isn't difficult at all. We've already done this with TELNET: the TELNET spec spoke of 7 bit ASCII, but vendors have gone (entirely?) to 8 bit at this time. I think it might be perfectly reasonable, as a vendor, to feel: (read in the spirit intended :-) "My implementation has been doing 8 bit for years, and so have others; any implementation that clears the 8th bit is broken, and you should get it fixed." Several SMTP-negotiation commands have been suggested; "8BIT", "ISOC", "CHAR" (in private mail I exchanged with John Klensin); I will use CHAR in the following, meaning "8 bit okay?" response is "250 OK" or (most likely) "500 No Such Command". Consider: what would I do anyway? If I am receiving, I will get whatever the sending MTA has anyway. If I am sending, I am going to send all 8 bits anyway. Since I am not going to behave differently, why should I try to find out which state I am in? That last point is the essential part: suppose I have a mailer which implements CHAR. And suppose I have a message that I accepted in 8 bit "mode". I call your mailer: I say "CHAR". You say "250 OK". I send the message. Now suppose the same message, I say "CHAR". You say "500 Unknown Command" (or "501 Not Implemented"). I clear the high bit of each character and send .... STOP! No, I do no such thing: I don't need to; you will do it for me. I might as well just send it through in 8 bit. You _might_ handle it properly! Conclusion: the sending MTA's behavior is _not_ modified by the negotiation. Now look at the receiver. I listen to the "CHAR" command; say either 500/501 all the time (because I predate knowledge or implementation of the CHAR command), or "250 OK" all the time, because I know the command, and pass 8 bit always. (There is no reason to add knowledge of the CHAR command to a mailer without also taking out the code that clears the 8th bit, is there? :-) Either way, I accept what I know, 8 bit or 7 bit. Therefore: the receiving MTA's behavior is _not_ modified by the negotiation. Ergo nothing is modified by the CHAR command, therefore it is unnecessary: we might as well assume the status quo pro ante! Best Regards, Rob Ullmann +1 508 620 2800 x1736 From psv@nada.kth.se Tue Jan 29 00:06:40 1991 Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA18364; Tue, 29 Jan 91 00:06:40 EST Received: from cyklop.nada.kth.se by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA29276; Tue, 29 Jan 91 06:05:59 +0100 Message-Id: <9101290505.AA29276@nada.kth.se> To: John C Klensin Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: Mail Working Group Directions. In-Reply-To: Your message of Mon, 28 Jan 91 17:04:05 EST. <665100245.831937.KLENSIN@INFOODS.MIT.EDU> Date: Tue, 29 Jan 91 06:05:59 +0100 From: Peter Svanberg > John C Klensin writes: > > - no support at this stage for 8bit to 7bit translation or encoding. > If the receiving SMTP cannot accept 8bit transmission, the message goes > in ASCII or not at all. *What* ASCII? Just bit-stripped 8bit characters? But then you havn't solved our "immediate problems": distortion of mail contents. That's what we have already (from some senders who are using Latin-1 locally and who's letters are not changed by mail software before leaving the host). It's terrible. Imagine, for example, that all characters 'o' in this letter where changed to 'v', all 'e' to 'd' and all 'i' to 'y'. Wvuldn't thys bd rathdr hard tv rdad? > - no use of extended (i.e., 8 bit) characters in the envelope or > headers, only in the message body. I think it might be desirable and > practical to relax this restriction somewhat for the headers, but it > means consideration on a field-by-field basis, which might not be worth > the trouble. At least "Subject" and the comment parts of "To" and "From" headers (containing personal name) must be avaliable for non-ASCII characters. --- Peter Svanberg psv@nada.kth.se Dept of Num An & CS, Royal Institute of Tech Stockholm, SWEDEN From kankkune@cs.helsinki.fi Tue Jan 29 09:36:32 1991 Received: from hydra.Helsinki.FI by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00632; Tue, 29 Jan 91 09:36:32 EST Received: from poros.Helsinki.FI by hydra.Helsinki.FI (4.1/SMI-4.1/32) id AA24191; Tue, 29 Jan 91 14:36:10 +0200 Date: Tue, 29 Jan 91 14:36:10 +0200 From: kankkune@cs.helsinki.fi (Risto Kankkunen) Message-Id: <9101291236.AA24191@hydra.Helsinki.FI> In-Reply-To: Robert Ullmann's message as of Jan 28, 22:23 X-Mailer: Mail User's Shell (7.2.0 10/31/90) To: ietf-smtp@dimacs.rutgers.edu Subject: Re: on negotiating ISO8859-1 in SMTP > If clearing the 8th bit is simply regarded as a _bug_, to be _fixed_, > it isn't difficult at all. We've already done this with TELNET: the > TELNET spec spoke of 7 bit ASCII, but vendors have gone (entirely?) > to 8 bit at this time. > > [some good arguments deleted] > Conclusion: the sending MTA's behavior is _not_ modified by the > negotiation. > > [some good arguments deleted] > Therefore: the receiving MTA's behavior is _not_ modified by the > negotiation. I agree with this entirely. It would be nice, if the 8-bit sender could translate the letter to a suitable character set for 7-bit receiver, but that would make things too complicated. Different character sets and multimedia formats should be specified in the RFC 822 headers, not at SMTP level. If we are going to remove the 7-bit restriction I would like to see the line length restriction to be removed at the same time. This would allow to send full binary to capable mailers. Sites that don't allow unlimited line length would either trucate the lines or discard the mail. This is similar to what happens in Roberts suggestion with 7 and 8 bits. I don't know how big the modifications to current mailers would be (it cannot be a big deal to read the data say 80 characters at a time instead of one line at a time), but that is not an issue. It is important to set these requirements or suggestions to the extended 821 standard in order to guide the future versions and implementations. These changes require so little modifications to the code, that I think they (at least eliminating the 7-bit restriction) would be done quite widely in short time. The other issues raised on this list should be dealed with in the extended 822, IMHO. Risto -- Risto Kankkunen kankkune@cs.Helsinki.FI (Internet) Department of Computer Science kankkunen@finuh (Bitnet) University of Helsinki, Finland ..!mcsun!uhecs!kankkune (UUCP) From KLENSIN@infoods.mit.edu Tue Jan 29 09:38:02 1991 Received: from INFOODS.MIT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00762; Tue, 29 Jan 91 09:38:02 EST Date: Tue 29 Jan 91 07:32:17-EST From: John C Klensin Subject: Re: Mail Working Group Directions. To: psv@nada.kth.se Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: <665152337.724937.KLENSIN@INFOODS.MIT.EDU> In-Reply-To: <9101290505.AA29276@nada.kth.se> Mail-System-Version: Peter Svanberg takes exception to my straw-man suggestions.... >> If the receiving SMTP cannot accept 8bit transmission, the message goes >> in ASCII or not at all. > >*What* ASCII? Just bit-stripped 8bit characters? But then you havn't >solved our "immediate problems": distortion of mail contents. I think this is very helpful, because I'm trying to figure out what the problem is. But, independent of whether what I proposed is reasonable, this is *not* what I proposed. It does, however, point out the possible problme with Robert Ullman's "just declare the 7bit folks broken" proposal. Actually, I think bit-stripping is a *terrible* idea. I think the very essence of any data interchange operation--and I consider email an example of that, however trivial--is that the sender announces what is going to be sent, and the receiver agrees to accept it and, by implication, to do something reasonable with it. The sender does not then go off and send something else, and the receiver is at least morally obligated to not distort whatever is received beyond useful recognition. If the receiver rejects the form the sender wants to send, the sender needs to either give up and abort transmission or, in this case, send *real* ASCII, not bit-stripped Latin-1, or, for that matter, packed EBCDIC. "Real ASCII", in your case, implies, e.g., all of those doubled-vowel conventions: hard to read, not really "right", but with minimal loss of meaning. If I'm operating a receiving-SMTP, I might want to decline to accept Latin-1, not because I can't handle an 8bit data path, but because I know that--because of internal storage or device constraints--there is absolutely no way that any user of my host is going to be able to display the stuff. One does not get handling and display of non-ASCII characters on a host by making declarations about the transport of electronic mail. So, what I tried to suggest is that: (1) A sending host who wants to send Latin-1 announces that fact. (2) If the receiver can accept Latin-1, and do something acceptable (by the definitions of its users) with it, and can accept it over an 8bit path, than it accepts the option and continues in that mode. (3) If it rejects the option, the sender has two choices: (i) It, with knowledge of what the text means and the coding conventions (e.g., German a-with-umlaut to "ae", Swedish a-with-circle- above to "aa") that are applicable, transforms the message into plain ASCII and sends it. (ii) It decides that the receiver isn't a sufficiently sophisticated member of the international community to communicate with, and gives up. There is a dissemination advantage to this sort of strategy, too. No matter how often one points to a system and says "broken" (apologies to Robert's strategy), nothing is going to focus the attention of a vendor or system manager who doesn't feel these extensions are worth the trouble as a lot of communication from customers and users that "comment" on the fact that people are refusing to talk to them because their systems are, from an "internationalization" standpoint, seriously retarded. Now, much of this is predicated on another set of assumptions: there really are very few machines around any more that can't now support 8bit data transmission. Of those few, even fewer support Latin-1, and the likelihood of the supporting vendors modifying the operating systems for multinational characters is, shall we say, pretty low. So, if those assumptions are correct, we probably won't see Latin-1 acceptably handled on those machines regardless of what we do to SMTP. The most we can do for/about them is to rework the protocols sufficiently that sender can detect that they just are not going to handle Latin-1 so it does not get sent to them, avoiding precisely the type of nonsense that appears in your examples. Those assumptions might be wrong. For all I know, Digital is just waiting to get a note that says "we just extended SMTP to handle multinational characters" so that they can announce a new version of TOPS-20 with VT320 support, four characters per word, and a lot of new conversion routines. But somehow I rather doubt it. If, however, there are important counterexamples, I'll immediate convert to the position that the transport should support some encoding strategy to reliably carry 8bit stuff over a 7bit path so that the two UAs see nothing but 8bit characters. The other point is that nothing in the above prevents a sender from pushing most anything through uuencode or btoa and then transmitting it as mail. Aesthetically and for security reasons, I'd rather that they didn't, but even I don't care very much what I think on that subject. But this ceases to be a transport problem--the transport is either, in this model, "real ASCII" or "real Latin-1", the latter transported over an 8bit path. I'm also not convinced that it needs to be standardized, even in the message format--this could easily be handled, as it has been for years, under the "agreement between interchanging parties" principle. But a header field, or an RFC suggesting that the first line of the message body should contain some information about the encoding, might not be a bad idea. >At least "Subject" and the comment parts of "To" and "From" headers >(containing personal name) must be avaliable for non-ASCII characters. Here, Peter, we actually completely agree. I was trying to state a position from which one could build back up, rather than the "handle ISO8859-n, everywhere" position that I've read into a few other suggestions. I'd be inclined to add all of the other comment components and the contents of any field whose name starts "X-" to this list. But I'd prefer to keep the field names themselves in ASCII (or columns 2-7 of ISO8859-1), and, while I'm prepared to be convinced that it is necessary, I think all of our lives would be easier if we kept the extended characters out of actual addresses (local-parts and domains). --john Klensin@INFOODS.MIT.EDU ------- From kankkune@cs.helsinki.fi Tue Jan 29 10:41:20 1991 Received: from hydra.Helsinki.FI by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA05938; Tue, 29 Jan 91 10:41:20 EST Received: from poros.Helsinki.FI by hydra.Helsinki.FI (4.1/SMI-4.1/32) id AA00615; Tue, 29 Jan 91 17:41:26 +0200 Date: Tue, 29 Jan 91 17:41:26 +0200 From: kankkune@cs.helsinki.fi (Risto Kankkunen) Message-Id: <9101291541.AA00615@hydra.Helsinki.FI> In-Reply-To: John C Klensin's message as of Jan 29, 7:32 X-Mailer: Mail User's Shell (7.2.0 10/31/90) To: ietf-smtp@dimacs.rutgers.edu Subject: Re: Mail Working Group Directions. John C Klensin: "Re: Mail Working Group Directions." (Jan 29, 7:32): > Peter Svanberg takes exception to my straw-man suggestions.... > > >> If the receiving SMTP cannot accept 8bit transmission, the message goes > >> in ASCII or not at all. > > > >*What* ASCII? Just bit-stripped 8bit characters? But then you havn't > >solved our "immediate problems": distortion of mail contents. > I think this is very helpful, because I'm trying to figure out what > the problem is. > But, independent of whether what I proposed is reasonable, this is > *not* what I proposed. It does, however, point out the possible problme > with Robert Ullman's "just declare the 7bit folks broken" proposal. > Actually, I think bit-stripping is a *terrible* idea. But bit-stripping is what happens now. So, Robert's suggestion wouldn't make things any worse for users of 7-bit mailers. > If I'm operating a receiving-SMTP, I might want to decline to accept > Latin-1, not because I can't handle an 8bit data path, but because I > know that--because of internal storage or device constraints--there is > absolutely no way that any user of my host is going to be able to > display the stuff. Wouldn't it be more reasonable to pass that letter bit-stripped (or converted as you have proposed) by your receiving-SMTP? Maybe the receiving user wouldn't mind the bit-stripping. At least she would see, that something is wrong and tell it to the sender. > (i) It, with knowledge of what the text means and the coding > conventions (e.g., German a-with-umlaut to "ae", Swedish a-with-circle- > above to "aa") that are applicable, transforms the message into plain > ASCII and sends it. I don't think such transformations work very well. In Finninsh a and o with diersis are quite common. In our national character set they are mapped to { and |. Bit-stripping the corresponding Latin-1 characters gives d and v. Many people here find reading text with { and | (when terminal hasn't got Finnish character set) or with d and v easier than bare a and o (they get mixed up with normal a and o). Sequences like ae for a-diersis are also quite hard-to-read. > So, if those > assumptions are correct, we probably won't see Latin-1 acceptably > handled on those machines regardless of what we do to SMTP. The most we > can do for/about them is to rework the protocols sufficiently that > sender can detect that they just are not going to handle Latin-1 so it > does not get sent to them, ... Wouldn't it be more reasonable to make local character conversions on those sites that aren't going to support Latin-1 rather than to complicate the SMTP protocol unnecessarily? All those character conversions at the SMTP level would make it complex and maybe introduce some restrictions to non-text (multiedia) mail. The representation conversions should really be done at 822 level. Risto -- Risto Kankkunen kankkune@cs.Helsinki.FI (Internet) Department of Computer Science kankkunen@finuh (Bitnet) University of Helsinki, Finland ..!mcsun!uhecs!kankkune (UUCP) From psv@nada.kth.se Tue Jan 29 11:14:48 1991 Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA08665; Tue, 29 Jan 91 11:14:48 EST Received: from cyklop.nada.kth.se by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA01375; Tue, 29 Jan 91 17:14:09 +0100 Message-Id: <9101291614.AA01375@nada.kth.se> To: John C Klensin Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: Mail Working Group Directions. In-Reply-To: Your message of Tue, 29 Jan 91 07:32:17 EST. <665152337.724937.KLENSIN@INFOODS.MIT.EDU> Date: Tue, 29 Jan 91 17:14:08 +0100 From: Peter Svanberg > John C Klensin writes: > > --- > Now, much of this is predicated on another set of assumptions: there > really are very few machines around any more that can't now support 8bit > data transmission. --- > --- If, however, there > are important counterexamples, I'll immediate convert to the position > that the transport should support some encoding strategy to reliably > carry 8bit stuff over a 7bit path so that the two UAs see nothing but > 8bit characters. One problem is if an intermediate computer along *some* of the possible routes between the two UAs hasn't been changed. Then - if conversion is used - the receiving UA gets 8bit characters sometimes and converted 8bit characters at other times. With encoding all letters can be (silently) decoded when necessary. Other advantages with encoding: *) The interested and experienced user on a machine with a not-interested system manager can easily change her UA-environment to get 8bit characters. *) The interested but unexperienced user on a similar machine can often still read most of the letters and will certainly tell her system manager about the problem - the "dissemination advantage" remains. From ARIEL@relay.prime.com Tue Jan 29 15:17:26 1991 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA16711; Tue, 29 Jan 91 15:17:26 EST Received: from Relay.Prime.COM by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA04050; Tue, 29 Jan 91 15:16:17 EST Message-Id: <9101292016.AA04050@rutgers.edu> Received: (from user ARIEL) by Relay.Prime.COM; 29 Jan 91 15:18:38 EST To: IETF SMTP list From: Robert Ullmann Subject: on mailing binaries Encoding: 25 text, 1140 text Date: 29 Jan 91 15:18:39 EST Hi, As promised, what follows is the specification, including working code, for sending "binary" objects through the mail. /refrain/: Eight Bit Text is Not Binary, Binary is Not Eight Bit Text! It defines an encoded form that is usually much smaller than the binary object. The only (normal) cases in which the encoded form is larger is when the object is already compressed, or (worse :-) encrypted. (Compression has to be done _before_ encryption to be useful!) The mailable object will survive all known (to me ...) forms of mailer and gateway munging. The only requirement is that the mailers handle the requisite size message. (If not, you can resort to manual splitting and recombination with any editor that comes to hand ... :-) Note that this is based on LZ77, not LZ78; it does not infringe on the Unisys patent _alleged_ to cover LZW. The code included is in the public domain. Best Regards, Robert Ullmann Network Working Group R. Jung, R. Ullmann Request for Comments: DRAFT Prime Computer, Inc. January 1991 LZJU90: Compressed Encoding for Binary Mail 1. Status of this Memo This memo describes an encoding [1] for a binary object to be sent in an Internet mail message. The encoding provides both compression and representation in a text format that will successfully survive transmission through the many different mailers and gateways that comprise the Internet and connected mail networks. Distribution of this memo is unlimited. 2. Introduction The encoding first compresses the binary object, using a modified LZ77 algorithm, called LZJU90. It then encodes each 6 bits of the output of the compression as a text character, using a character set chosen to survive any translations between codes, such as ASCII to EBCDIC. The 64 six-bit strings 000000 through 111111 are represented by the characters "+", "-", "0" to "9", "A" to "Z", and "a" to "z". The output text begins with a line identifying the encoding. This is for visual reference only, the Encoding: field in the header identifies the section to the user program. It also names the object that was encoded, usually by a file name. The format of this line is: * LZJU90 where is optional. For example: * LZJU90 vmunix This is followed by the compressed and encoded data, broken into lines where convenient. It is recommended that lines be broken every 78 characters, to survive mailers than restrict line length. The decoder must accept lines with 1 to 1000 characters on each line. After this, there is one final line that gives the number of bytes in the original data and a CRC of the original data. This should match the byte count and CRC found during decompression. This line has the format: * Jung, Ullmann [Page 1] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 where is a decimal number, and CRC is 8 hexadecimal digits. For example: * 4128076 5AC2D50E The count used in the Encoding: field in the message header is the total number of lines, including the start and end lines that begin with *. A complete example is given in section 6. 3. Specification of the LZJU90 compression This data compression specification uses the Lempel-Ziv-Storer-Szymanski model of mixing pointers and literal characters. The data compression is defined by the decoding algorithm. Any encoder that emits symbols which cause the decoder to produce the original input is defined to be valid. There are many possible strategies for the maximal-string matching that the encoder does, section 5 gives the code for one such algorithm. Regardless of which algorithm is used, and what tradeoffs are made between compression ratio and execution speed or space, the result can always be decoded by the simple decoder. The compressed data consists of a mixture of unencoded literal characters and copy pointers which point to an earlier occurrence of the string to be encoded. Compressed data contains two types of codewords: LITERAL pass the literal directly to the uncompressed output COPY length, offset go back offset characters in the output and copy length characters forward to the current position. To distinguish between codewords, the copy length is used. A copy length of zero indicates that the following codeword is a literal codeword. A copy length greater than zero indicates that the following codeword is a copy codeword. To improve copy length encoding, a threshold value of 2 has been subtracted from the original copy length for copy codewords, because the minimum copy length is 3 in this compression scheme. The maximum offset value is set at 32255. Larger offsets offer extremely low improvements in compression (less than 1 percent, typically). No special encoding is done on the LITERAL characters. However, unary Jung, Ullmann [Page 2] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 encoding is used for the copy length and copy offset values to improve compression. A start-step-stop unary code is used. A (start, step, stop) unary code of the integers is defined as follows: The Nth codeword has N ones followed by a zero followed by a field of size START + (N * STEP). If the field width is equal to STOP then the preceding zero can be omitted. The integers are laid out sequentially through these codewords. For example, (0, 1, 4) would look like: Codeword Range 0 0 10x 1-2 110xx 3-6 1110xxx 7-14 1111xxxx 15-30 Below are the actual values used for copy length and copy offset: The copy length is encoded with a (0, 1, 7) code leading to a maximum copy length of 256 by including the THRESHOLD value of 2. Codeword Range 0 0 10x 3-4 110xx 5-8 1110xxx 9-16 11110xxxx 17-32 111110xxxxx 33-64 1111110xxxxxx 65-128 1111111xxxxxxx 129-256 The copy offset is encoded with a (9, 1, 14) code leading to a maximum copy offset of 32255. Offset 0 is reserved as an end of compressed data flag. Codeword Range 0xxxxxxxxx 0-511 10xxxxxxxxxx 512-1535 110xxxxxxxxxxx 1536-3583 1110xxxxxxxxxxxx 3485-7679 11110xxxxxxxxxxxxx 7680-15871 11111xxxxxxxxxxxxxx 15872-32255 Jung, Ullmann [Page 3] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 The 0 has been chosen to signal the start of the field for ease of encoding. The stop values are useful in the encoding to prevent out of range values for the lengths and offsets, as well as shortening some codes by one bit. The worst case compression using this scheme is a 1/8 increase in size of the encoded data. (One zero bit followed by 8 character bits). After the character encoding, the worst case ratio is 3/2 to the original data. The minimum copy length of 3 has been chosen because the worst case copy length and offset is 3 bits (3) and 19 bits (32255) for a total of 22 bits to encode a 3 character string (24 bits). Jung, Ullmann [Page 4] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 4. The Decoder As mentioned previously, the compression is defined by the decoder. Any encoder that produced output that is correctly decoded is by definition correct. The following is an implementation of the decoder, written more for clarity and as much portability as possible, rather than for maximum speed. When optimized for a specific environment, it will run significantly faster. /* LZJU 90 Decoding program */ #include typedef unsigned char uchar; typedef unsigned int uint; #define N 32255 #define THRESHOLD 3 #define STRTP 9 #define STEPP 1 #define STOPP 14 #define STRTL 0 #define STEPL 1 #define STOPL 7 static FILE *in; static FILE *out; static int getbuf; static int getlen; static long in_count; static long out_count; static long crc; static long crctable[256]; static uchar xxcodes[] = "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ\ abcdefghijklmnopqrstuvwxyz"; static uchar ddcodes[256]; static uchar text[N]; Jung, Ullmann [Page 5] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 #define CRCPOLY 0xEDB88320 #define CRC_MASK 0xFFFFFFFF #define UPDATE_CRC(crc, c) \ crc = crctable[((uchar)(crc) ^ (uchar)(c)) & 0xFF] \ ^ (crc >> 8) #define START_RECD "* LZJU90" void MakeCrctable() /* Initialize CRC-32 table */ { uint i, j; long r; for (i = 0; i <= 255; i++) { r = i; for (j = 8; j > 0; j--) { if (r & 1) r = (r >> 1) ^ CRCPOLY; else r >>= 1; } crctable[i] = r; } } int GetXX() /* Get xxcode and translate */ { int c; do { if ((c = fgetc(in)) == EOF) c = 0; } while (c == '\n'); in_count++; return ddcodes[c]; } int GetBit() /* Get one bit from input buffer */ { int c; while (getlen <= 0) { c = GetXX(); getbuf |= c << (10-getlen); getlen += 6; } c = (getbuf & 0x8000) != 0; getbuf <<= 1; getbuf &= 0xFFFF; getlen--; return(c); } Jung, Ullmann [Page 6] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 int GetBits(len) /* Get len bits */ int len; { int c; while (getlen <= 10) { c = GetXX(); getbuf |= c << (10-getlen); getlen += 6; } if (getlen < len) { c = (uint)getbuf >> (16-len); getbuf = GetXX(); c |= getbuf >> (6+getlen-len); getbuf <<= (10+len-getlen); getbuf &= 0xFFFF; getlen -= len - 6; } else { c = (uint)getbuf >> (16-len); getbuf <<= len; getbuf &= 0xFFFF; getlen -= len; } return(c); } int DecodePosition() /* Decode offset position pointer */ { int c; int width; int plus; int pwr; plus = 0; pwr = 1 << STRTP; for (width = STRTP; width < STOPP; width += STEPP) { c = GetBit(); if (c == 0) break; plus += pwr; pwr <<= 1; } if (width != 0) c = GetBits(width); c += plus; return(c); } Jung, Ullmann [Page 7] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 int DecodeLength() /* Decode code length */ { int c; int width; int plus; int pwr; plus = 0; pwr = 1 << STRTL; for (width = STRTL; width < STOPL; width += STEPL) { c = GetBit(); if (c == 0) break; plus += pwr; pwr <<= 1; } if (width != 0) c = GetBits(width); c += plus; return(c); } void InitCodes() /* Initialize decode table */ { int i; for (i = 0; i < 256; i++) ddcodes[i] = 0; for (i = 0; i < 64; i++) ddcodes[xxcodes[i]] = i; return; } main(ac, av) /* main program */ int ac; char **av; { int r; int j, k; int c; int pos; char buf[80]; char name[3]; long num, bytes; if (ac < 3) { fprintf(stderr, "usage: judecode in out\n"); exit(1); } in = fopen(av[1], "r"); if (!in){ fprintf(stderr, "Can't open %s\n", av[1]); exit(1); } Jung, Ullmann [Page 8] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 out = fopen(av[2], "w"); if (!out) { fprintf(stderr, "Can't open %s\n", av[2]); fclose(in); exit(1); } while (1) { if (fgets(buf, sizeof(buf), in) == NULL) { fprintf(stderr, "Unexpected EOF\n"); exit(1); } if (strncmp(buf, START_RECD, strlen(START_RECD)) == 0) break; } in_count = 0; out_count = 0; getbuf = 0; getlen = 0; InitCodes(); MakeCrctable(); crc = CRC_MASK; r = 0; while (feof(in) == 0) { c = DecodeLength(); if (c == 0) { c = GetBits(8); UPDATE_CRC(crc, c); out_count++; text[r] = c; fputc(c, out); if (++r >= N) r = 0; } else { pos = DecodePosition(); if (pos == 0) break; pos--; j = c + THRESHOLD - 1; pos = r - pos - 1; if (pos < 0) pos += N; Jung, Ullmann [Page 9] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 for (k = 0; k < j; k++) { c = text[pos]; text[r] = c; UPDATE_CRC(crc, c); out_count++; fputc(c, out); if (++r >= N) r = 0; if (++pos >= N) pos = 0; } } } fgetc(in); /* skip newline */ if (fscanf(in, "* %ld %lX", &bytes, &num) != 2) { fprintf(stderr, "CRC record not found\n"); exit(1); } else if (crc != num) { fprintf(stderr, "CRC error, expected %lX, found %lX\n", crc, num); exit(1); } else if (bytes != out_count) { fprintf(stderr, "File size error, expected %lu, found %lu\n", bytes, out_count); exit(1); } else fprintf(stderr, "File decoded to %lu bytes correctly\n", out_count); fclose(in); fclose(out); return; } Jung, Ullmann [Page 10] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 5. An example of an Encoder Many algorithms are possible for the encoder, with different tradeoffs between speed, size, and complexity. The following is a simple example program which is fairly efficient; more sophisticated implementations will run much faster, and in some cases produce somewhat better compression. This example also shows that the encoder need not use the entire window available. Not using the full window costs a small amount of compression, but can greatly increase the speed of some algorithms. /* LZJU 90 Encoding program */ #include typedef unsigned char uchar; typedef unsigned int uint; #define N 8192 /* Size of window buffer */ #define F 256 /* Size of look-ahead buffer */ #define THRESHOLD 3 #define NIL N /* End of tree's node */ #define STRTP 9 #define STEPP 1 #define STOPP 14 #define STRTL 0 #define STEPL 1 #define STOPL 7 #define CHARSLINE 78 static FILE *in; static FILE *out; static int putlen; static int putbuf; static int char_ct; static long in_count; static long out_count; static long crc; static long crctable[256]; static uchar xxcodes[] = "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ\ abcdefghijklmnopqrstuvwxyz"; Jung, Ullmann [Page 11] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 static uchar text[N + F - 1]; static int match_position; static int match_length; static int lson[N + 1]; static int rson[N + 257]; static int dad[N + 1]; #define CRCPOLY 0xEDB88320 #define CRC_MASK 0xFFFFFFFF #define UPDATE_CRC(crc, c) \ crc = crctable[((uchar)(crc) ^ (uchar)(c)) & 0xFF] \ ^ (crc >> 8) void MakeCrctable() /* Initialize CRC-32 table */ { uint i, j; long r; for (i = 0; i <= 255; i++) { r = i; for (j = 8; j > 0; j--) { if (r & 1) r = (r >> 1) ^ CRCPOLY; else r >>= 1; } crctable[i] = r; } } void PutXX(c) /* Translate and put xxcode */ int c; { c = xxcodes[c & 0x3F]; if (++char_ct > CHARSLINE) { char_ct = 1; fputc('\n', out); } fputc(c, out); out_count++; } Jung, Ullmann [Page 12] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 void PutBits(c, len) /* Put rightmost "len" bits of "c" */ int c, len; { c <<= 16 - len; c &= 0xFFFF; putbuf |= (uint) c >> putlen; c <<= 16 - putlen; c &= 0xFFFF; putlen += len; while (putlen >= 6) { PutXX(putbuf >> 10); putlen -= 6; putbuf <<= 6; putbuf &= 0xFFFF; putbuf |= (uint) c >> 10; c = 0; } } void EncodePosition(ch) /* Encode offset position pointer */ int ch; { int width; int prefix; int pwr; pwr = 1 << STRTP; for (width = STRTP; ch >= pwr; width += STEPP, pwr <<= 1) ch -= pwr; if ((prefix = width - STRTP) != 0) PutBits(0xffff, prefix); if (width < STOPP) width++; else if (width > STOPP) abort(); PutBits(ch, width); } Jung, Ullmann [Page 13] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 void EncodeLength(ch) /* Encode code length */ int ch; { int width; int prefix; int pwr; pwr = 1 << STRTL; for (width = STRTL; ch >= pwr; width += STEPL, pwr <<= 1) ch -= pwr; if ((prefix = width - STRTL) != 0) PutBits(0xffff, prefix); if (width < STOPL) width++; else if (width > STOPL) abort(); PutBits(ch, width); } void InitTree() { int i; for (i = N + 1; i <= N + 256; i++) rson[i] = NIL; for (i = 0; i < N; i++) dad[i] = NIL; } Jung, Ullmann [Page 14] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 /* Insert string of length F, text[r..r+F-1], into one of the trees (text[r]'th tree) and return the longest-match position and length via the global variables match_position and match_length. If match_length = F, then remove the old node in favor of the new one, because the old one will be deleted sooner. Note r plays double role, as tree node and position in buffer. */ void InsertNode(r) int r; { int i, cmp, p, c; uchar *key, *keyp, *txtp; cmp = 1; key = &text[r]; p = N + 1 + key[0]; rson[r] = lson[r] = NIL; match_length = 0; for ( ; ; ) { if (cmp >= 0) { if (rson[p] != NIL) p = rson[p]; else { rson[p] = r; dad[r] = p; return; } } else { if (lson[p] != NIL) p = lson[p]; else { lson[p] = r; dad[r] = p; return; } } txtp = &text[p]; keyp = key; if ((cmp = *++keyp - *++txtp) != 0) continue; for (i = 2; i < F; i++) if ((cmp = *++keyp - *++txtp) != 0) break; Jung, Ullmann [Page 15] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 if (i > match_length) { match_position = ((r - p) & (N - 1)); if ((match_length = i) >= F) break; } else if (i == match_length) { if ((c = ((r - p) & (N - 1))) < match_position) match_position = c; } } dad[r] = dad[p]; lson[r] = lson[p]; rson[r] = rson[p]; dad[lson[p]] = r; dad[rson[p]] = r; if (rson[dad[p]] == p) rson[dad[p]] = r; else lson[dad[p]] = r; dad[p] = NIL; } Jung, Ullmann [Page 16] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 void DeleteNode(p) /* Delete node p from tree */ int p; { int q; if (dad[p] == NIL) return; if (rson[p] == NIL) q = lson[p]; else if (lson[p] == NIL) q = rson[p]; else { q = lson[p]; if (rson[q] != NIL) { do { q = rson[q]; } while (rson[q] != NIL); rson[dad[q]] = lson[q]; dad[lson[q]] = dad[q]; lson[q] = lson[p]; dad[lson[p]] = q; } rson[q] = rson[p]; dad[rson[p]] = q; } dad[q] = dad[p]; if (rson[dad[p]] == p) rson[dad[p]] = q; else lson[dad[p]] = q; dad[p] = NIL; } main(ac, av) /* main program */ int ac; char **av; { int r, s, i, c; int last_match_length; int len; if (ac < 3) { fprintf(stderr, "usage: juencode in out\n"); exit(1); } in = fopen(av[1], "r"); if (!in) { fprintf(stderr, "Can't open %s\n", av[1]); exit(1); } Jung, Ullmann [Page 17] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 out = fopen(av[2], "w"); if (!out) { fprintf(stderr, "Can't open %s\n", av[2]); fclose(in); exit(1); } char_ct = 0; in_count = 0; out_count = 0; putbuf = 0; putlen = 0; MakeCrctable(); crc = CRC_MASK; fprintf(out, "* LZJU90 %s\n", av[1]); InitTree(); r = 0; s = 0; /* Fill lookahead buffer */ for (len = 0; len < F && (c = fgetc(in)) != EOF; len++) { UPDATE_CRC(crc, c); in_count++; text[s++] = c; } while (len > 0) { InsertNode(r); if (match_length > len) match_length = len; if (match_length < THRESHOLD) { EncodeLength(0); PutBits(text[r], 8); match_length = 1; } else { EncodeLength(match_length - THRESHOLD + 1); EncodePosition(match_position); } Jung, Ullmann [Page 18] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 last_match_length = match_length; for (i = 0; i < last_match_length && (c = fgetc(in)) != EOF; i++) { UPDATE_CRC(crc, c); in_count++; DeleteNode(s); text[s] = c; if (s < F - 1) text[s + N] = c; s = (s + 1) & (N - 1); } while (i++ < last_match_length) { DeleteNode(s); s = (s + 1) & (N - 1); len--; } r = (r + last_match_length) & (N - 1); } /* end compression indicator */ EncodeLength(1); EncodePosition(0); PutBits(0, 7); fprintf(out, "\n* %lu %08lX\n", in_count, crc); fprintf(stderr, "Encoded %lu bytes to %lu symbols\n", in_count, out_count); fclose(in); fclose(out); } 6. LZJU90 example The following is an example of an LZJU90 compressed object. Using this as source for the program in section 4 will reveal what it is. * LZJU90 example 8-mBtWA7WBVZ3dEBtnCNdU2WkE4owW+l4kkaApW+o4Ir0k33Ao4IE4kk bYtk1XY618NnCQl+OHQ61d+J8FZBVVCVdClZ2-LUI0v+I4EraItasHbG VVg7c8tdk2lCBtr3U86FZANVCdnAcUCNcAcbCMUCdicx0+u4wEETHcRM 7tZ2-6Btr268-Eh3cUAlmBth2-IUo3As42laIE2Ao4Yq4G-cHHT-wCEU 6tjBtnAci-I++ * 190 081E2601 References [1] David Robinson, Robert L. Ullmann. Encoding Header Field for Internet Messages. RFC 1154, Prime Computer, April, 1990. Jung, Ullmann [Page 19] RFC DRAFT LZJU90: Compressed Encoding for Binary Mail January 1991 Author's Address Robert Jung 2606 Village Road West Norwood, MA 02062 USA Phone: +1 617 769 5999 Email: robjung@world.std.com Robert Ullmann 10-30 Prime Computer, Inc. 500 Old Connecticut Path Framingham, MA 01701 USA Phone: +1 508 620 2800 x1736 Email: Ariel@Relay.Prime.COM Jung, Ullmann [Page 20] From MRC@cac.washington.edu Tue Jan 29 15:39:58 1991 Received: from akbar.cac.washington.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17705; Tue, 29 Jan 91 15:39:58 EST Received: from tomobiki-cho.cac.washington.edu by akbar.cac.washington.edu (5.65/UW-NDC Revision: 2.21 ) id AA01398; Tue, 29 Jan 91 12:39:54 -0800 Date: Tue, 29 Jan 1991 11:43:36 -0800 (PST) From: Mark Crispin Sender: Mark Crispin Subject: Re: Mail Working Group Directions. To: IETF Internet Mail Extensions WG In-Reply-To: <9101291614.AA01375@nada.kth.se> Message-Id: Hello. My interest in this working group was for two reasons: my current research and work in distributed electronic mail/IMAP, and my authorship of the TOPS-20 SMTP server. In the latter, I am the closest thing to "support" for TOPS-20 email that exists, albeit mostly limited to free-time activity. I would like to put on this second hat and comment on the 7bit/8bit issue a bit. Since a few important resources that run TOPS-20 still exist, are likely to exist for a few years to come, and are somewhat important players in the Internet email game, they shouldn't be arbitrarily shut out. As a minority critter in the swamp, TOPS-20 should play the game the way the young whippersnappers choose to play, at least as much as it is reasonable to. It is also a feasible task to attempt to upgrade all the TOPS-20 SMTP servers in the world. What is not feasible is to migrate from 7bit local ASCII text files to 8bit. As was pointed out, the more natural migration would be to 9bit (might as well not waste those four bits). However, nobody wants to undertake the conversion of all the necessary tools from 7bit to 9bit at this late stage. What may be feasible is acceptance of 8bit incoming mail, and possibly transmittal of 8bit outgoing mail. It would have to be converted to a 7bit form internally. Frankly, I think that fixing the TOPS-20's will be a lot easier than fixing all the zillions of Unix and VMS implementations out there. I would like to see an official blessed form of 7bit encoding of ISO-Latin mail. This is distinct from 8bits as a general case or other character sets. Japanese JIS and the mainland Chinese GB system already have fully functional ways of transmittal over 7bit links. As for Taiwan's BIG5 system, we can worry about that when the ROC takes over the mainland (probably about the same time everyone discards Unix and goes back to TOPS-20...). Other international character sets, such as Arabic, Sanskrit, etc. can wait until X.400 takes over. After all, X.400 *is* coming, isn't it? The whole reason for this is that all the proposals involve some fall-back into 7bits. Even if the TOPS-20's are fixed (along with a massive adoption in the Unix world), they will end up talking to some 7bit only loser sooner or later. It would be better if TOPS-20's internal 7bit representation was the same as that of everyone else, and that this representation have an unambiguous conversion back into 8bits, so that eventually when a smart guy gets it it can be converted back. It is probably alright to have a very simple-minded transformation that trades off space for speed of implementation. It is probably also alright for the transformation to be ugly to human eyes. People who can't read Swedish aren't going to be concerned how ugly Swedish looks on a box which can't represent Swedish. People who can read Swedish are going to care, but they will also want to get a box which can represent Swedish. All that is necessary is that if the message goes from A to B to C (where A and C are Swedish-capable and B is not) that C gets the message intact. -- Mark -- From Laytenl@peo-mis-emh1.army.mil Tue Jan 29 19:43:23 1991 Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA25171; Tue, 29 Jan 91 19:43:23 EST Received: from PEO-MIS-EMH1.ARMY.MIL by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA20068; Tue, 29 Jan 91 19:43:12 EST Message-Id: <9101300043.AA20068@rutgers.edu> Date: Tue, 29 Jan 91 15:39:31 EST From: Larry Layten 703-644-2307 To: Internet Mail Extensions Working Group Subject: Re: Mail Working Group Directions. I may have missed it somewhere, but I would like to know how large the domain of 7 bit (only) capabilities actually is. It seems to me that several limitations of existing mail handlers affect the discussions that I have seen. The first is the capability to actually pass 8 bit data, and the second is the capability to handle other than text. The two are really not realated, as I know many implementations that look at message bodies and headers as strings of a finite length, even though they are (or would be with little work) capable of handling (passing?) 8 bit characters within both a message header and a message body. (I am really only talking about ASCII vice Extended ASCII - I guess). I beleive that the passing of data files is best left to an X.400 implementation, with appropriate encoding (uuencode etc) to solve specific encapsolation of binary data within a body in bridges to specific environments, because solving the string length problem would requires rewritting many obsolete mail handling systems. If, however, the size of the 7 bit environments is small, then I beleive that looking for ways to address the problem within each specific (7 bit) enevironment, looking first at the feasability of moving it into an 8 bit world. Comments? -- Have I missed something? Larry From @rutvm1.rutgers.edu:CHRISTOPHER.J.TANNER@NVE.CRL.AECL.CA Tue Jan 29 19:50:15 1991 Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA25367; Tue, 29 Jan 91 19:50:15 EST Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 0363; Tue, 29 Jan 91 19:52:22 EST Received: from NOS.CRNL.AECL.CA by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 0362; Tue, 29 Jan 91 19:52:21 EST Received: From NVE.CRL.AECL.CA by NOS.AECL.CA via RCNET; Tue, 29 Jan 1991 19:51 EST Received: from CA*ENVOY100*AECL by NVE.CRL.AECL.CA via QTFS with X.400; Tue, 29 Jan 91 19:51:21 -0500 X400-Trace: CA*ENVOY100*AECL; arrival Tue, 29 Jan 91 19:51:20 -0500 action Relayed Date: Tue, 29 Jan 91 19:51:20 -0500 Message-Id: <5B011D13290202C2-MTANVE*Christopher.J.Tanner@NVE.CRL.AECL.CA> P1-Message-Id: CA*ENVOY100*AECL; 5B011D13290202C2-MTANVE Ua-Content-Id: 5B011D13290202C2 From: Christopher.J.Tanner@nve.crl.aecl.ca Subject: Character code sets and all that To: ietf-smtp@dimacs.rutgers.edu Hi Having read a number of messages in this group, I felt it time to insert my $.02 worth. First of all, there is a standard way of processing character sets such as ISO8859 in seven bits. ISO 2022 (?) describes a mechanism to designate what character sets are in use, and the standard code set (C0) does contain shift characters to state that the next character is from the G1 set. Sorry for the clumsyness of this description, but it has been a while since I read these documents. Secondly, allowing SMTP to process such character sets is just part of the problem. Most operating systems do not know what to do with G1 characters. Programming languages have little idea on how to compare strings containing characters from both G0 and G1 sets, and sorting is another problem. Some of this is being addressed by the Application Portability study group in ISO-IEC/JTC1 and the internationalization group with SC22 - languages, but work in this area is just starting. Revising SMTP as described is just small step, but a necessary step. I would go along with any reasonable extension that states what character sets are being used in a message. It would be nice if all systems followed the character set designation schemes described in the various standards, but I do not expect that to happen soon. Chris Tanner AECL Research From KLENSIN@infoods.mit.edu Wed Jan 30 09:00:14 1991 Received: from INFOODS.MIT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA11866; Wed, 30 Jan 91 09:00:14 EST Date: Wed 30 Jan 91 08:59:14-EST From: John C Klensin Subject: Re: Character code sets and all that To: Christopher.J.Tanner@nve.crl.aecl.ca Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: <665243954.981937.KLENSIN@INFOODS.MIT.EDU> In-Reply-To: <5B011D13290202C2-MTANVE*Christopher.J.Tanner@NVE.CRL.AECL.CA> Mail-System-Version: Chris Tanner writes, in part... >First of all, there is a standard way of processing character sets such as >ISO8859 in seven bits. ISO 2022 (?) describes a mechanism to designate >what character sets are in use, and the standard code set (C0) does contain >shift characters to state that the next character is from the G1 set. Well, I wondered how long it would take for this to come up. :-) / :-( Thanks, Chris, I didn't want to have to do it. Two small technical clarifications: (1) ISO2022 is correct. (2) However, ISO2022 and its G0/G1 language don't, in my reading and understanding, really directly apply to ISO8859 character sets, but to the registered character sets (typically of no more than six columns each) from which the ISO8859 character sets are mostly built. [- Use of "typically" and "mostly" is not just flame-prevention, there are exceptions and irregularities -] The ISO8859 character sets (note plural, this is about -1, -2, ..., -6, -7, ..., not just -1 (Latin-1)) are integrated 8bit sets, not objects that one swaps in and out ("designates") on a panel by panel basis. Now, that is not to dismiss that approach. If the conclusion was (i) "Latin-1" (in this context, a combination of two registered 7 bit character sets, with G0 being what amounts to ASCII, and the second being identical to columns 10 through 15 of ISO8859-1), and (ii) that we wanted to handle Latin-1 in a 7bit context (if I under Mark's TOPS-20-based arguments, I find them very persuasive) Then the use of the single-shift and locking shift modes of ISO2022 is a very attractive alternative to treating the characters as a special case of "binary" and encoding them. I assume here that we stick to Latin-1 here, and bind these two registered character sets to G0 and G1 in the RFC. Permitting mid-message switching between character sets is an interesting and useful idea, but it opens up all the problems associated with ISO8859-?, and then some (it does not open up the same problems as "binary"; as Robert points out, binary is not characters and characters are not binary). >Secondly, allowing SMTP to process such character sets is just part of the >problem. Most operating systems do not know what to do with G1 characters. >Programming languages have little idea on how to compare strings containing >characters from both G0 and G1 sets, and sorting is another problem. This is correct, but also easily misunderstood. One of the reasons for things like ISO8859 is to avoid the problem here, which is that "programming languages" handle strings in which the name and pedigree of every character is the same *lots* better than they handle strings that may represent different character sets, or that switch back and forth. So, while some of them have required some modification, in general, ISO8859-n is easy to handle in programming languages, as long as "n" is designated at compile time, or at least constant during a program invocation. They don't, in general, handle G0/G1 switching, or designation of character sets onto G0 or G1, at all. There has, incidentally been a requirement agreed to by ISO/IEC JTC1/SC22 for the former but, as far as I know, there has not even been serious discussion at the SC22 level of requiring support for G0/G1 switching. >Some >of this is being addressed by the Application Portability study group in >ISO-IEC/JTC1 and the internationalization group with SC22 - languages, but >work in this area is just starting. Can we make that a little more specific, Chris? The progress rate of these two efforts, IMHO, is such that they are likely to yield useful and widely-adopted standards circa three years after we are all using X.400. A strong suggestion to people who are interested in pursuing, or even understanding, this approach and its implications: There was a design effort a year or so back to figure out how to extend the Kermit protocol to deal with multinational character sets. It uses this switching approach, and assumes that, within file transfers, considerable redesignation of character sets might occur. The solution may not be ideal, or even correct, but it illustrates all (or almost all) of the issues involved and took a long time and several lengthy drafts to develop. Please go read that history, let's not reinvent it. The relevant document is available by anonymous FTP from watsun.cc.columbia.edu as /kermit/e/isok5.txt. The discussions that led up to it, for those who are interested, are in /kermit/e/iso8859.txt (warning, the latter is a circa half-megabyte file). --john Klensin@INFOODS.MIT.EDU ------- From @rutvm1.rutgers.edu:af@sei.ucl.ac.be Wed Jan 30 09:43:49 1991 Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA12558; Wed, 30 Jan 91 09:43:49 EST Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 2938; Wed, 30 Jan 91 09:45:58 EST Received: from VM1.calc.ucl.ac.be by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 2937; Wed, 30 Jan 91 09:45:57 EST Received: by BUCLLN11 (Mailer R2.07) id 2606; Wed, 30 Jan 91 15:41:48 +0100 Date: Wed, 30 Jan 91 15:33:20 +0100 From: "Alain FONTAINE (Postmaster - NAD)" Message-Id: <910130.153320.+0100.af@sei.ucl.ac.be> Subject: Re: Character code sets and all that To: John C Klensin , Christopher.J.Tanner@nve.crl.aecl.ca Cc: ietf-smtp@dimacs.rutgers.edu In-Reply-To: Message of Wed 30 Jan 91 08:59:14-EST from On Wed 30 Jan 91 08:59:14-EST you said: >Two small technical clarifications: > (1) ISO2022 is correct. > (2) However, ISO2022 and its G0/G1 language don't, in my reading and >understanding, really directly apply to ISO8859 character sets, but to >the registered character sets (typically of no more than six columns >each) from which the ISO8859 character sets are mostly built. > The ISO8859 character sets (note plural, this is about -1, -2, ..., -6, >-7, ..., not just -1 (Latin-1)) are integrated 8bit sets, not objects >that one swaps in and out ("designates") on a panel by panel basis. My copy of ISO 8859-1 says (chapter 8 - translated back from a french language edition - the english language edition wording is surely different): When this character set is used in the context of other encoding standards like ISO 2022 or ISO 4873, one should consider that it is made of the following elements: - the space character 02/00 - one G0 94-character set (02/01 to 07/14) - one G1 96-character set (10/00 to 15/15). The text then gives the escape sequences to be used to designate those G0 and G1 sets (i suppose it means they are registered). So it seems that what we are talking about is explicitly permitted... Alain FONTAINE +--------------------------------+ Universite Catholique de Louvain | If your mail software barks at | Service d'Etudes Informatiques | my address, you may try : | Batiment Pythagore | | Place des Sciences, 4 | FNTA80@BUCLLN11.BITNET | B-1348 Louvain-la-Neuve, BELGIUM +--------------------------------+ phone +32 (10) 47-2625 From psv@nada.kth.se Wed Jan 30 11:44:00 1991 Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA16133; Wed, 30 Jan 91 11:44:00 EST Received: from cyklop.nada.kth.se by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA28967; Wed, 30 Jan 91 17:43:39 +0100 Message-Id: <9101301643.AA28967@nada.kth.se> To: ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that In-Reply-To: Your message of Wed, 30 Jan 91 15:33:20 N. <910130.153320.+0100.af@sei.ucl.ac.be> Date: Wed, 30 Jan 91 17:43:39 +0100 From: Peter Svanberg So, to use ISO 2022: 1) Initiate with "ESC ( B ESC - A". 2) To use an 8bit character, send Ctrl-O, the character bitstripped and Ctrl-N. Is it backwards compatible with old mailers and UAs? Is it easy to implement? Note that ISO 10646 *forbids* the use of ISO 2022 code extension techniques. I try this here below, writing "Godel" with dots above the o. How does *your* mailer and UA handle it? It normally just becomes "Gvdel", perhaps with an "A" before it, but tell me if anything else happens to someone. (Sorry if it causes any trouble!) (B-AGvdel --- Peter Svanberg email: psv@nada.kth.se Dept of Num Analysis and Comp Science, Royal Institute of Technology, Stockholm, SWEDEN From philipp@inf.enst.fr Wed Jan 30 11:47:08 1991 Received: from corton.inria.fr by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA16195; Wed, 30 Jan 91 11:47:08 EST Received: from [192.44.64.233] by corton.inria.fr (5.65+/90.0.9) via Fnet-EUnet id AA21421; Wed, 30 Jan 91 17:46:57 +0100 (MET) Received: from ulysse.enst.fr (inf) by enst.enst.fr (4.1/SMI-4.0) id AA05604; Wed, 30 Jan 91 17:46:37 +0100 Received: from helios.enst.fr by ulysse.enst.fr (4.1/SMI-4.0-MHS-6.0) id AA01460; Wed, 30 Jan 91 17:46:36 +0100 Date: Wed, 30 Jan 91 17:46:36 +0100 From: philipp@inf.enst.fr (Philippe-Andre Prindeville) Message-Id: <9101301646.AA01460@ulysse.enst.fr> X-Network: Fnet-Eunet X-Organization: Telecom Paris (Ecole Nationale Superieure des Telecoms) X-Address: 46, rue Barrault - 75634 Paris cedex 13 - FRANCE X-Content-Type: ISO-FONT; 8859.1 (This message may contain ISO chars) To: ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that (i) "Latin-1" (in this context, a combination of two registered 7 bit character sets, with G0 being what amounts to ASCII, and the second being identical to columns 10 through 15 of ISO8859-1), and (ii) that we wanted to handle Latin-1 in a 7bit context (if I under Mark's TOPS-20-based arguments, I find them very persuasive) Then the use of the single-shift and locking shift modes of ISO2022 is a very attractive alternative to treating the characters as a special case of "binary" and encoding them. I'm not very excited about this approach. A single shift is fine (besides, selldom does one see more than 2 accented characters in a row), and doesn't require retaining state. However (there's always a catch), I propose that (a) Latin 1 is not complete for Western European languages (it lacks certain dutch and danish characters), and (b) adopting latin-1 was just be extending the "chauvanism" implicit in the tcp/ip suite to include Western Europe (or most of it, anyway). Let's have no piecemeal solutions, but rather extend the encoding to include all conceivable alphabetic scripts (including arabic, hebrew, and vietnamese). I was at the CEC last week talking to a guy from Bratislava. He normally corresponds in English, Czech, and Russian. Therefore, for him, at least 3 alphabets are needed. Possibly the math/logic symbol set as well. I assume here that we stick to Latin-1 here, and bind these two registered character sets to G0 and G1 in the RFC. Permitting mid-message switching between character sets is an interesting and useful idea, but it opens up all the problems associated with ISO8859-?, and then some (it does not open up the same problems as "binary"; as Robert points out, binary is not characters and characters are not binary). Latin 1 is insufficient. This is correct, but also easily misunderstood. One of the reasons for things like ISO8859 is to avoid the problem here, which is that "programming languages" handle strings in which the name and pedigree of every character is the same *lots* better than they handle strings that may represent different character sets, or that switch back and forth. So, while some of them have required some modification, in general, ISO8859-n is easy to handle in programming languages, as long as "n" is designated at compile time, or at least constant during a program invocation. They don't, in general, handle G0/G1 switching, or designation of character sets onto G0 or G1, at all. There has, incidentally been a requirement agreed to by ISO/IEC JTC1/SC22 for the former but, as far as I know, there has not even been serious discussion at the SC22 level of requiring support for G0/G1 switching. You can map all the characters into Unicode or some unique number space, and treat the characters as 16 bits. To save space, you can reserve the high order bit to signify a character being packed into 8 bits. The shifting (or mapping) need only be done when the message is on-the- wire. Can we make that a little more specific, Chris? The progress rate of these two efforts, IMHO, is such that they are likely to yield useful and widely-adopted standards circa three years after we are all using X.400. What's worse, X.400 has an even more restrictive character set, being meant to interoperate with Telex (and Pony Express as well, I suppose, and my kitchen sink, and ...). A strong suggestion to people who are interested in pursuing, or even understanding, this approach and its implications: There was a design effort a year or so back to figure out how to extend the Kermit protocol to deal with multinational character sets. It uses this switching approach, and assumes that, within file transfers, considerable redesignation of character sets might occur. The solution may not be This is similar to compression techniques where the alphabet (or symbol set) is frequently recoded (ie. statistical Hamming codes). -Philip P.S. I take it that "composite-characters" (ie. dead-keying) is not on the table? From owner-ietf-smtp@dimacs.rutgers.edu Wed Jan 30 12:30:24 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17200; Wed, 30 Jan 91 12:21:58 EST Received: from cunyvm.cuny.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17192; Wed, 30 Jan 91 12:21:54 EST Received: from YKTVMV by CUNYVM.CUNY.EDU (IBM VM SMTP R1.2.2MX) with BSMTP id 4508; Wed, 30 Jan 91 12:22:31 EST Date: 30 Jan 1991 12:19:47 EST From: dan@ibm.com (Walt Daniels) Phone: 914-784/863-6736 To: ietf-smtp@dimacs.rutgers.edu Message-Id: <013091.121947.dan@ibm.com> Subject: Godel On my EBCDIC IBM BITNET based mail I got: (B-AGvdel Which in hex was: 274DC227 60C1C70F A50E8485 93 |.(B.-AG.v.del | From owner-ietf-smtp@dimacs.rutgers.edu Wed Jan 30 13:30:26 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA19478; Wed, 30 Jan 91 13:25:14 EST Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA19474; Wed, 30 Jan 91 13:25:12 EST Date: Wed, 30 Jan 91 13:24:10 EST X-Mailer: Mail User's Shell (6.5 4/17/89) From: "Phillip G. Gross" To: Christopher.J.Tanner@nve.crl.aecl.ca, ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that Message-Id: <9101301324.aa07851@NRI.NRI.Reston.VA.US> Chris, Do you have any other info on the ISO spec you mentioned (ISO2022)? I am not familiar with it. Thanks, Phill From owner-ietf-smtp@dimacs.rutgers.edu Wed Jan 30 15:06:25 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21294; Wed, 30 Jan 91 14:35:17 EST Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21289; Wed, 30 Jan 91 14:35:15 EST Message-Id: <9101301935.AA21289@dimacs.rutgers.edu> Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 4428; Wed, 30 Jan 91 14:37:18 EST Received: from BLIULG11 by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 4420; Wed, 30 Jan 91 14:37:07 EST Received: from vm1.ulg.ac.be by BLIULG11 (Mailer R2.07) with BSMTP id 0588; Wed, 30 Jan 91 20:31:08 +0100 Date: Wed, 30 Jan 91 19:08:48 +0100 From: Andr'e PIRARD Subject: Re: Character code sets and all that To: Christopher.J.Tanner@nve.crl.aecl.ca, ietf-smtp@dimacs.rutgers.edu In-Reply-To: Your message of Tue, 29 Jan 91 19:51:20 -0500 Well there, I have welcomed an invitation to this list with much surprise. And I hope indeed that this discussion is the start to fit the fourth wheel that the great TCP/IP car lost during freight across the Atlantic :-) ISO 2022 is a nice standard to drive on a 7- or 8-bit path a smart device which is able to switch character sets. It's a dreadful one to store data for many many reasons, the least not being overlaps of the ISO 8859 versions for identical characters. A consensus is that this problem can only be solved with a multibyte code. Which category SMTP falls in, stored or transmitted data, is disputable, but envisioning the "smart transmit" aspect would suppose we have something smart stored to transmit in the first place. Proprietary codes probably exist (to avoid processing 2022), but converting files from one to the other at the file transfer layer I wouldn't dare. When a multibyte code will exist to make how to store data (almost) a standard, there will be no more problem how to transmit it. Kermit specs are split in two levels: 1 to 1 byte translation and ISO 2022. If there had been no split, only the Japanese would use them. With it, at least 4 international versions came up soon. Given the small amount of trouble compared to the practical value of admitting 8-bit just in the body and stating an interchange code, I think it's the thing to do just now. It's sad that a byte is not enough for a universal code and that we'll have to say it's either version of ISO, guess which. But I think the situation is just not mature enough now to fully mix language anyway. And don't take it being selfish if I say that ISO 8859-1 is at hand to make a high percentage of the world happy while devoted people are working hard to include the rest forever. Now that many computers exist that use 8-bit codes, with Unix on the way, it's time we decide what an 8-bit code is for TCP/IP in general. It's sad to think we still live in a anteascii era for this respect. I hope I'll have time to explain how, for us, these different codes are just as different as EBCDIC differs from ASCII and that just as TCP/IP states that ASCII is used on communication lines, to the point that it is often implicit, it should state what it is for 8-bit, at least as a very strong recommendation. How to translate ISO to EBCDIC, Macintosh, IBM PC and maybe others is not a given thing, believe me. I'm just Andr'e, but I have the same problems as "(B"-AG"v"del :-) Andr'e PIRARD SEGI, Univ. de Li`ege B26 - Sart Tilman B-4000 Li`ege 1 (Belgium) pirard@vm1.ulg.ac.be or PIRARD%BLIULG11.BITNET@CUNYVM.CUNY.EDU From owner-ietf-smtp@dimacs.rutgers.edu Wed Jan 30 17:30:22 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA27645; Wed, 30 Jan 91 17:26:10 EST Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA27619; Wed, 30 Jan 91 17:25:13 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA15124; Wed, 30 Jan 91 23:23:58 +0100 Date: Wed, 30 Jan 91 23:23:58 +0100 From: Jan Michael Rynning To: Mark Crispin Cc: IETF Internet Mail Extensions WG Subject: Re: Mail Working Group Directions. In-Reply-To: Your message of Tue, 29 Jan 1991 11:43:36 -0800 (PST) Message-Id: Mark Crispin writes: > I would like to see an official blessed form of 7bit encoding of ISO-Latin > mail. This is distinct from 8bits as a general case or other character sets. > Japanese JIS and the mainland Chinese GB system already have fully functional > ways of transmittal over 7bit links. So, how do they do it? From what I've heard they use some escape sequence to shift to their character set, and encode every character as two octets, using only the lower 7 bits of each octet. Is that correct? Can you give a more detailed description? Do they use their characters in headers? I know some people think it's a can of worms to allow anything but ASCII in mail headers, but there are a few places where it would be nice to have: - In the "text" of the "Subject" and "Comments" fields. - In the "phrase", which is used for the real name, in the "From"/"To"/etc. fields, for people who have non-ASCII characters in their names. - In comments. Some mail systems put the real name in as a comment. There are also some places where I don't think it's worth the trouble: - In a "field-name". - In an "addr-spec". If Japanese and Chinese people use their characters in structured header fields, how do they quote them, if they have to? I mean, one of their characters may look like "G@", when coded as two octets. Do they put such things into double-quotes, if they use them in structured header fields? Jan Michael Rynning, jmr@nada.kth.se Department of Numerical Analysis If you can't fully handle domains: and Computing Science, ARPA: jmr%nada.kth.se@uunet.uu.net Royal Institute of Technology, UUCP: {uunet,mcvax,...}!nada.kth.se!jmr S-100 44 Stockholm, BITNET: jmr@sekth Sweden. Phone: +46-8-7906288 From owner-ietf-smtp@dimacs.rutgers.edu Wed Jan 30 18:30:22 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28773; Wed, 30 Jan 91 18:01:06 EST Received: from DILITHIUM.BBN.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28766; Wed, 30 Jan 91 18:01:01 EST Received: by dilithium.BBN.COM (AA26012); Wed, 30 Jan 91 17:57:02 EST Message-Id: <9101302257.AA26012@dilithium.BBN.COM> From: "Terry Crowley" Date: Wed, 30 Jan 91 17:57:00 EDT Subject: Re: Mail Working Group Directions. To: jmr@nada.kth.se Cc: mrc@tomobiki-cho.cac.washington.edu, ietf-smtp@dimacs.rutgers.edu X-Translated: From BBN/Slate multimedia format (version 1.2). >> So, how do they do it? From what I've heard they use some escape sequence >> to shift to their character set, and encode every character as two octets, >> using only the lower 7 bits of each octet. Is that correct? Can you give >> a more detailed description? Do they use their characters in headers? The system I saw used escape sequences to shift into and out of a two octet encoding for each kanji character. The JIS-2 encoding is convenient in that it sparsely populates the 16-bit space used for kanji such that both the MSB and LSB are legal 7-bit ASCII. That is, if you look at it as a sixteen bit space they start encoding the Kanji characters at about 256 * 32 + 32 and only use 91 (or so) of the spaces in each 256 block that follows. Convenient. (Those numbers are not precise - I'm sure someone else can do a better job of describing this precisely). Terry Crowley From owner-ietf-smtp@dimacs.rutgers.edu Wed Jan 30 19:00:21 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28764; Wed, 30 Jan 91 18:00:47 EST Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28758; Wed, 30 Jan 91 18:00:35 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA16597; Thu, 31 Jan 91 00:00:15 +0100 Date: Thu, 31 Jan 91 00:00:14 +0100 From: Jan Michael Rynning To: philipp@inf.enst.fr (Philippe-Andre Prindeville) Cc: ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that In-Reply-To: Your message of Wed, 30 Jan 91 17:46:36 +0100 Message-Id: Philippe-Andre Prindeville writes: > However (there's always a catch), I propose that (a) Latin 1 is not > complete for Western European languages (it lacks certain dutch and > danish characters), ... I have discussed this matter with a number of Dutch and Danish people, who all work with computerized typesetting. They have all confirmed to me that *** ISO Latin 1 does contain all characters needed for Dutch and Danish ***. I have reason to believe that they know their own languages. I also have reason to believe that people who use ISO Latin 1 every day, and spend a fair amount of their time converting between ISO Latin 1 and other character sets, know ISO Latin 1 quite well. Jan Michael Rynning, jmr@nada.kth.se Department of Numerical Analysis If you can't fully handle domains: and Computing Science, ARPA: jmr%nada.kth.se@uunet.uu.net Royal Institute of Technology, UUCP: {uunet,mcvax,...}!nada.kth.se!jmr S-100 44 Stockholm, BITNET: jmr@sekth Sweden. Phone: +46-8-7906288 From owner-ietf-smtp@dimacs.rutgers.edu Wed Jan 30 19:40:58 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00212; Wed, 30 Jan 91 18:50:26 EST Received: from akbar.cac.washington.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00200; Wed, 30 Jan 91 18:50:21 EST Received: from tomobiki-cho.cac.washington.edu by akbar.cac.washington.edu (5.65/UW-NDC Revision: 2.21 ) id AA17988; Wed, 30 Jan 91 15:48:29 -0800 Date: Wed, 30 Jan 1991 15:35:02 -0800 (PST) From: Mark Crispin Sender: Mark Crispin Subject: Re: Mail Working Group Directions. To: Jan Michael Rynning Cc: IETF Internet Mail Extensions WG In-Reply-To: Message-Id: Encoding: 27 TEXT, 513 UUENCODE In , Jan Michael Rynning writes: >> Japanese JIS and the mainland Chinese GB system already have functional >> ways of transmittal over 7bit links. >So, how do they do it? From what I've heard they use some escape sequence >to shift to their character set, and encode every character as two octets, >using only the lower 7 bits of each octet. Is that correct? Can you give >a more detailed description? See the attachment below which describes how it is done in Japan in detail. I don't know for sure, but I assume the mainland Chinese do something similar. If you have an RFC-1154 capable mailer, you should be able to pull this attachment without any problem. Otherwise extract the text between the "begin" and "end" lines and run it through the Unix "uudecode" utility and then through the "uncompress" utility. >Do they use their characters in headers? No; too many of the programs in use would have to be fixed to accomodate that. I once fixed one of my programs to do so, and after a lot of work got it to be right, only to be told that nobody would ever use that feature. In Japan, the convention is for personal names (in From:) and subjects to be in romanized Japanese or in English, even if the body of the message is entirely in Japanese using the native system. -- Mark -- begin 420 japan.inf.Z M'YV0(`(&I(*F#`@S;]BP>7,GC9LS!].P,9AF#H@P=L)(#"-F(@@Z:.2\J7,& MS44W;]SD:3/2HA$J4"[2^5@01)4A1()823+%11$B54`H""-G#)HT=@RBF).& MCL$8,G"X@`K#A8RI*5R`2#*S(@B'-`W"J2/FA1*>(,BDD5-F#)TW`XK8LB MC8LR+N:&`=%&XT0Y(.;0D9,&#IR]($;.5)"F#1RX=`+/-$HTC-LRC%/,!5G& MS>/(I?.`:#C'Y-O88G*#'6V&;7,08][`45[]39O$'-\D5<"FXN;4(XN"5J!` M21`H09RX2.+$B`[V`@<>M>AU-)DR:D`@A)_!0:@04[A,5MGGX4&PA+Q!4'7B?&=E0((=3#U$`AE3.26 M2&ZD,09IIAEVT7<66& M1`8E!X(8!E571HU'FL>:&$3N91ABD-4Q9AUE@E#>&B`R&`89?`D))T@.0?1< MG=BEI.=,*,#%(Z87E2J'A%>F)!P(<$P4QAP&3303G""L@=(=*XK9D$(*D,&@ MD'Q.YL8:Q]7A`GL*Y"?MM-2"4`0310Q!A11/.)'$$"!L&]\41A0A!0A/&+'A M>_$5,441^%4KK[Q+%`>"%%HQ40>8!H&`P@8D".&#"TY,L0$*2H``L!`WU'&P M$BE$.^_$-N6X&%-T*'<@"%=4A*>,+32Q:44I24QQ0&NPL>]_0-C1QAPNE#;& M&"XP1_->DL:&6J\J\PN$S3*/8?+)1`L4PX95.%$$"#'DD`,,T%(1A!#8HJON M$-U248035$QQ7T#T;?L$4-HFT:W$\$E!!=,Z@'!#"T(DL;9[\"G]+@A8$^$N MVD&H#8(,;>,`M]SKUNWNTGGO'5#::\_0]KM.$$'?$86W^R[??M/0MA39%I'$ M3DY03K?E\"[>]]HUM(VU$U:8.X793EA=N=V(CZTX"(R#8$/;H]-N;:8A?%`$P-+RWRE]<012M! M4,B&.O3.@WH%+)-4`/U:<);E9>>.T,I=#-KVMKC-C5V^2YP+!1+3:FZ,.`<- M4$'@9!$GQ7(T8W0,*7E9*K7,:`YY4`TRY\*HF1!G+4)!D1/.T@+Z^&L)27A1 M'-7*UAZN#05+>,*+KOF7CY8AI",%4%K2D$7@X-$@Y6PDE.`)%@5,L@HY.L\4 M`(%:+HZQC&YP*..<&,0BCH]<#7GB%,>UCU8DC4/@P8$E))/?,H*V&<@YM M@RD;A#27S9H5.&UPR'&Q,TMD4DI-!I&TQMX9&\0X%K]/2.DN0WE*BZ"$1W2@ M=2IU9)*44&0F$J)04F`&`B,H&`^FK549ADL?]UA/"DU0``J@0(0IO$@,RFD" MK].@2TD%@35ZH(Y(OC,%#\8&DEP=[Y\#JF<_VNN[R MXBAE'?W.46D>>S=8'A2.L(6>31_EED3ARB1#'5P"Y9: MO:$>9J&'+%JKF/X5,!^\H0E1H$$0'J:`A>U`#P][$5AZ'`.M0($M=BA0C!9$ M$Z],4C40(DI=P((SO,0J2\!9+--P4`,8A/PL4E(`%3P.\O3Z:V%%L`/*5?[R MF.^@XU7X>%"LOJG_/,@X96\*EHB9@QH\74S/F7K80:X`JS]'#)44E8/52,SG M-(T&-1CXAM`"=#`)O526=(A%'CZ1I&R:XW7O;Z,GB1`VK(&E,RGN'70#F[M_ MM`Z9G'HG7Q"T6D>85TI*;,ID2BK%?RBQ MM!!&!QM@45RS5TTR,"MQ5':P00,7P2EJEQ+$9'<,\G-29$@'^"SS9FFF5$^* M51>Y=5[DI3=]E`;_D2=F5$W/`1+.MS']UE/QQEXUX6F=Q1+H]H$1AEJA!7XU M6$9&MW$^5Q/ZIA4Z-5YFYA6?Q!:1%"-[@7N(X7S*%8-TMF$!AA>OT2"`=6`@ M4B./%UH.!VJB]G@KB"&H)'H)IA`,P5(6X5AW\`8M@&V512NFQ($"=R)/,!>. M,4EJ5%M/!";#P846X84'%4H`-VI,LAI?5(;.5WFQI8;G5@9OF`=K\AJH08?^ M(@:^P2S/`6@L%#NJ8FC.)2TJ0FF1.(ES*$]<]5-H!%9B]1WJ-"-O(P9\!(>" MA6(G=6B)U4T'YEL&D1D\%P,VP`(S@`,.2(<6@0)0@0.#HAS)F`)?ER!\MAC# M41."F&HWR(225!,YX(!P4$FY<4PFA4L0]WKUAQI6PAIX\%`,PD>)I6HTLHN_ M-1JE@0>U40??H0`;@P,LX&,V(&OS=(S;N(P@L(VA*&]">$/^5HTYN'JM)UC! MB`,W<`.ID1N&)P=U88Q?%XPS4`,S\':T81PR8"GG5R/'>(`H)`,LD`,VX'9D M]RJ!<0:P@6WH$CL8XE5G\'4F24,S((PXH(SI!8`=!9-U(9.?I!9N-"LXT(\H M(C5J%01SX6,@@`0+%`1'D")S$0-.YP;V"!S'-1`1L40;,8I!"=Y M)"=9,5I>07P!]HUC28#&T6,S((4SY4Y?F$A(N$_6$2>P`46, MF(8P!0+C%AAIH`>P<3%`:!%G4"'&@5PL:%8)H@4_D0+2>3(>QCD$I7=`@)'$&L6 M!`)"D#0"A`0@1X)3$!3QYD%7,()+@P+_.63O43MZ8VW48IW8J9U&P)W>"9Y* M()[D:9[H.0(XIF/S631:5IW7F9W;V9W?20+A.9[E>9[I*69*$&7>1:+3>:)` M8*$JFJ$LRJ$>&J,A2J-DQJ`E2C1:M@0\BJ$:VJ(O^J$R.@)MQAY.H$9*EQVO M`B"WP2]::!&$LF=]EAB&YH\KM2(*:9<*<%N((DMFU8;AB%+&B)J@^2%Y\HB; M:5"(99;`\8R!-1?M$9P'%:5)5.H")K MV9;@IP`BP#DB<'75R08QUT?.%)IY$I/*,4D/-Z9Q*@(DL*EQ)`)L(`*:.6^E MV1`SLEIA<`:?!`*K0\ M0;3K29U"D*%%4`1,[,@W',EX#`,WX';RDLF< MW)'R0C,"27,E+7"VPS#18.2^T+,HX$%2G\S>!,SA'93AWHU2^ MZUJ]]1]WL9MJ%$4XJ(1ITA&E&A(C41(TDD0XPD3C5$WO1&ER<'D$MGH&\;PS M`1@@81'#4DTS MU;P&(3);E">_9!),:!%3()4OX6_&*R6A1*U&(GAJ.G'42RM/TEH?XJ4UHB"S MM8>FU-,\^'.\E8,D(EP#F@0U75Q>,12;IIMZ41=-,`4MD%[.I0#].UWN$L,S M+=54X&\HL-5OMV)%1I$V(B&H/PGZ%$GL%O"8;,TO= MA`?XUWEM6(I^E843(GBSRIAZBAHH8($=LSDMMAVL>^K8PH4%:]5\U0(9$MT`*II3?#0XW3_ M5KZ*A;[51"7K:^:?]";``3-S=C+_6[\4@[]28[\4P[_1%>@4)#%4D`4DQ#8F M,\#R[<%&P]N^#=P3(P,P(,'&+9$3@P,2G`-&8#+RFA(=O":83.`5;N"\R30Z M7&*#7-QZ["^M;L(W4&*)W`(YT*"M?@-&L&*-_NB`$^G8_>'Y,=PT(.N8/B\S MD.L;Z>GS4@0PT`)%,.H"4>J&Q..IGNRK3NL*_NH!$>O&W8\H4.L1B>O3ONO> M/A.^#NR.OC2.0^P>KNWA;ND5ONSRHNF2GMT'_#?$;=SX7BW-/NW/3C'23NT!O\&F M;NSA;N$R\)4HK/`,3S$/'_'/Y=8H4`0BTN)UD30RA#>\*)]'U-;@,JBJ45+% MR(NU]M$+71IQD8,>B`)27IP]_"*[QDA5G1>;%U9B>HMNL]B[`>/UU&@=.0>9 MH6Q^311M8O3U=(Q/+R<*$`-S@9)_*?6JE*W6/.6Q_2>DFAA,*`>--ELQ,E!O MA(.Y]>-N0.UB61&KE-Y^CD],#KRT*BK#&X31?+QWK[S:QX-KGWX;`_1$^+V$ M`7]6/N>S*([K_9`L"+'A`7C_)&8P.< M7@,-7RV'+,'\7L'R$@027"X2,_"ZSS2\?^&_3RW!O_`G/#'&#_'P9'Y)>D=`\.!EL8`8MP!2LR2),:95#84B9WY'E?WP\6)5) M`'4D)P0S``-=MV0]@*/E-&)BC`J2#$H]S$V;W!KT!4WXD%H($'-IXS0<]\'U M0-KD\UXU@3;8!MR@&WK.C`A"SH2)$S5[<`=:0 M:^Q%;K$":LT*A`$\P".,PF#B#RQ%XV2$\W"N?MH8X2\B1R;HEI&&3(J);;`G M;XX([KG44!NDST5`0W?`N@F%?;'0+`DX>G$JA=9H&7_!5U[#B_`\`65":(:\ M!P67AQ2D@C+C*-04#5039,-8$%?VK":05*`9]$^J0U.3-H5P+N@\+#`7BA/- M>$PJXDP9/3SAFAS#)#0M6V$YB(0U00J=VU\Q,'3`2"0K-[`8;--UFP@A+@<] M!U]C!*D>^8HE8J\I#"^$TU'F0*B9`XW!G"B';@+.EH@XZ1%I\!M^!Z_PV!:% M9<(=1&$F!)YX,06T1DP8*B"`!("`+9!>>,`0D`(^H`MDF/R0*EX#0&F%KS"W M*"2'\`A73$4L`C&!AJRW?G@,44.P.@CC!JB)E4%1!^Y,<.L.W^$![@A+6`9< MHD4$`3UP"WA$D.@"EF)('(D"`04H'#D!`DY`>CD!*T)`)"+!0A0YH2<4A:'0 M*,+$BI<1-^+U"!Y,P"F*Q'DA%4=5+_$F-\(@-A%Y1@;;@%B,":G#6H@,B>`# M-B+D?02=\)S%""!P15%D#]GO-AF$W`0,EX`P)!PJ[2/01.#P M4)Q""F0:"\^@703FY?@P#44\BAB1"LQ&&P@H&@D:<8W01FY(C)<8$U5'2FB' M$4$9;CT;=],<%$+J*7/A!?&$:D@'KN%70`RUL`Z@N(P2$)@C4LPT+B!^Y0<- MQ_WNQ(LX@%Y`:1I!92D1Y1 M1;@(3L29:4@-=1%'7DB-N`6"W=(@BFK*!5`!Z'0EIP7"(&WX<3FL0!E)%,X- MN1I,$>.9$9PO8G"66I2H$>'ADZP)J8),H!S&89*3Q#\BPB_9))Q;78R'QU!- MY,!Z:!*,A6!I=92HR*D?D5!-XA0*N!0&0>.FV<'9%_/* M.*0K2K33]*1-\!8G9)R!R5;T!AM18NB#>2H/(D,&H2J3I3IZECL/H>&*7N*. MWDGG81#82*RLN9IS1`X#LXIOO8F+`"?.-(C"53V[DZSF4/*<\E#DK-PC:1:H MH4D<")062U2%+B$#FX9(N@!FG"4HM@*D2`^*05Z:$@40'E&<'"R9 MU@PL9+GB$N<6YB11B)2RCQ2+:^(0A\ZP.D/PI59%1#DI4PB%FZ`(&Q":Z(;: M8QNC&#FB(O#QH!%*7M$P,B8 M$1,09RJ`2%&)O,,C-"S&S&_DQ=71.J3`Z]A/Z6)V."=G)BW$Q&30/TNP*>B) M5@$TP0(4N!5N)&@JES34*O8>S^$-'B6U:,/?LMV:)*`P#"FS#Y$!C[`ST4(< M`7J7:`]10!OWI^CE!51`AH3T/)AU5@$W9RW2-]GQ1F''%7%>%$#B>$SMY:ID M%C?5#0'*!`2:&,$TT)>V00]SBYN``QM!SI6,K=8"T,?R2(17T%]0`=!C82"B M@&$#L//L(#4;J"J>9_24@,9!40H%[%E3_,4K?)S?$WC6!"$!>O+,3'1@)#2E@)+:'8P`0VZ!3N M@\3(.,(GY\`,G,%CHL)4T'16`2NPIP5)*\B"6>`)+P`^:!+]I19@!5S(`WDR M(-`PF`$8+$)^X`$.CGZJAQST!*>@]W0C$F.&?@$X0#-JZ)')H>G!?AJ]W+(] MY4#WA)]`5(96!!>`!ORGM,"ADJ%&_$]`$40B@B'I5D`3$%:1/U+GTM=;2`B@ MT[A,I*!CD6@%%\DS&T-@GK2$27#V18Z`FT?S"BR0(>!35(9846^%4*[8IO)G M,?I<4\@#/N)P'K1TE7-,%!3("8OP$06:4_+"&8`+ MKBW^/`=NB"3O39B:)!F!-410V[GX#))7`)J3(0D:AV`#W90#6TB<3=`X@(5Q M(2:&0*/IFI3B(MG-XJ@`TN:MP9PPZ5\&!N40](;I:=B-EB)_0E/H)4BXB`K< M$I.$E^(-`D.1A(]%T@I-0#E$M#HE`V_*+U1,4.*5LJ\UF&CXR?#R#0+BEZRA M5*IS+(+9I`/^A]@83LKP!B%*:?@M3"%L)J]RTO5&W,3I.1:A>+8-#3ZB)&(>%H!TD*8>K"98SL1B%-[#])$FE.9(2TP7* M$N/0/(OCKO$(F,$-W%&](Q;6J6FA/U3DHDI`5Y-`"`<)'2?&P3`YBX1H4Q0DS`&%1$#NY`R$E,BJ>_"`])X:L%!`R*$:8H!T5&'I0J MA-`84&(ZP]$<"V6AI[P`<-%"O\D+C:$@0`W,#!>P(Z8%%M6A2O2WY!9MVDQM M`Z!@#$%TKE;1-7=#'5S_W*(3X<^Y2VRE2(O),8D_QJ25W!=XHA`M"ZTP/H7B M4-RYO<1SUF%N<`M>9+#`BDB26["*=D"DX3`!]A970I'2H7+(#.W+/B8&<&%5 MI8U$1'.4QB`AR^A4J!V/B(I92# M0\X#G@%'U`8&8F7U$0(V+JP3!A%9_PEI@`L484J`/CD7'H@$0JVD;:#OK2$A MX1B@B)@P%K@'E:Q`9[)N\@P8[0WM2U?EJPKI-W:'X+P;UR);B`UO`2[4A_#( M(<8#>2@/YA$YGD?TFR/6['`12>(,)-<249QR2H M5&Q&7P1/GGNL8PI9"(4[-ED:K'5X# M"*J39[35Z(E$2AGF`C0:27RO71D2+(;84&VRDF81I;G)@9R#.`O";.HB4.3* MP@5F012/D;(]9T&2I<225C>H_@-;X`N`<,K*16G)*P6+C'"QM07*W9K*:&V_ M#A_Q"KP5D"3:&4'?NJ),TK;AQ(N`$>!Y+!Q$$^H(8QT@I% M0,!>&8,I.6E`3H^$YC`:7^P:U'T,)"W#$^['#BI!-/$(Q-;:4)**1@:^# MK5RN"I0I2_`OB$%5J1(NZ]GD#Y'0G,:&(K<9VNP3T3_'A4'P)AW1>,JH/&"M(SD2]&:5S-+DH!;]+>I`C4H@?@!,GN<@&0UZ;;W'`HR M\`OA+'!(KG4J;?Z@F=YNH04FLX6I\0HH0D*KP!28 M"VIJTI@ARL-`'\)0X!2,<9T@DWN*6(Y3+SH^!E>QR`$U!024`!1P`BH@^`;? M#P,G&RZAA;VE4;`0BMB4)5S;AOB]<%4@#%[S\7E?[E7AM[L"PW:66J&\QF8< M86YVIB9:$Y-H'QW#TRU.?&B2?%XUBJ6>2>A3%;Q)*R"!U"-?,@;TC)P*Z?4V MVM((3:(5\`:+L%22H)@![!WU8`@KP&SC`-$,! M`P$EA"#I*LPH9W\N%W)<_BOV&`D4$<#Z$`B4DY1`5U^P`F:_<,']'IUO22H^ M;Z@`5RAX3?0'W@@'?:/$`,*YH00,8<&0'1(P'``"=6!?%"<7$(9=``YVEIC! M(S08>R0&B)<9PE;`M%*$X)J0'EU`":C#8%@,VY,R[&_U;^QUE(T!N%B5(1$!0`!6Y"[TI^PH!>:"_VHT M6&.@]98L->5.'I,(`GCA/UC@K0,`B9B M"CO""<\(^5@7SFS$,6QUAFAFDX?X."5F,.*),V'!D"5F;&'A9=VC!0"!(9%CYQT-(X`\%JTL;B&KV'V`615"/Y9* M0*#*3N`W8<&3`+XV[,89ESDGS^QD5O)/#F37]3US`%>-A!28W.R$5^B8HX%1 MZ(4K84[QI^3J7BS8PQT'=6(R/ M5=]3NC3"M9D2[;@\5$41RC'%Q7AI!<`LF(T+4HU\1Q.R#+:Z,%G8C:<]E9U3 M`8PS-#)AEA,^.1;.#;(A2FI$>,OL;)[.,Z(Z/Y0V&$GJ8ZL(CP#"K)!*)%`1 MP@"#J%(TK#2(0900G\-`"2#+2=4,YQHOC(G7C2;FQ,M)`;CE%T)"XO(,*39U M&3#KD!Q2/WB(#P$B-'9MY(^HQ#_\AWT*(`/D)5CHI<$Y>BH#<2!^N2D_9";R M3BK:(:6/*7=%\-E^!A&D52+I@J=Y+JI1:54EPM!I`JK.D%<\ST:!&NK$53:\ MH%@HQ";]@ZW4KRPJ#QF#YSQ`LZD<7'0T`P MK'%DFL@4AF23%\MW2`EY1DPH@#_Q1:>*9DJ-)75C8"O6R&71R#1T5%`V-A`( M>B6IG:"JB!3#@:0^BR6@-:!-'S%_,NE1KPB+?&X2PPG1HY'C"2RH`HE,CC&4 M2[NMHG7BJ6*#(^B`X:L+-T==`<3RJ5Q@PQ60J4$SL6@3\^<8R"RK3B9!1`@T M@1>0!)0UWF`"W<)=^`OJ:3U7S)-]U2!6<&`29DSBX"."WZSX8K6JM_5D-*KLJ-Y@,O6^:E$>9('BBF7"B- M'E@Y)=C&--?$#1PGQBX2RSI)1O2.*-&ET>Q%D2<\?XL"I?C25(#"1@<+:UJW M@[#@&NC,A*-0 M$DBKJJ#63V!YK$)PFK-W(SH;U*J"!`2S4=J.HQR=VQC]*/"LO$WU`-'9`HJZ MHZHNE`:ME!?(]B<>"L>9@*H*`[H0$*A`1J_:PS06HSB[;J%F!E2*[:%FD.`(&@43,\QF?` M.,&MFV!OQOLS/`;W!@$U^V;CXO2,@\`"GJ`^JW+CI,TQXH/^L4DCF#&Y![\V MBP`%N`76:`)-0#,M`>>!5WXV$6`-BV%H,\ZB*@X5I[V024E`#]BC-(`NT$"X MV3P<)VL;TJWM:H9`7B`L<"%'A`&X`1PPQ41`K?.7[FD3QOP;'D17D."AIF44 M"%WA`@C%.?,FB!1G:(4!+C23YP`=)-SLZ.)N"+J[)ZCS`3B,U_*"!AMNPBWO M]Y842X'R)J/A?16,VU5H&B]"VI!0M/H"3AD%I]Y3&H;N<.=1"OF)EU*=!F@D M/1X4H)$XTEIK-++Z6$5;'J$9&#=M*SV81?%D4QW@?2:\QT#@TT>X)+N)(3%6F,D5A,Y'7^Q:1B.B$PR!`L'4[X.S>D*$VFSQ$`H.(MT]@VQ MD%D)ZSH@3E^FI,"\/`<;K42*$RHLPJ)F@NM!6GB+M:(B.G43X#VLG(F\4I(,:DX[253HTN0HMG0J24.7MAF3&"Q@CHM6MQH4,UVS:N3"D M>^;9DK/M['43P4*QBB,*@#>;V]'SBS%G&`.AA+;W)D(XOW1'1NROQ M*.IE'0_IRA6=]TX:'.@T#=.:#7++6!>$+OKV#<:\.:38-?1P5@M5=!.RZ)V0 M@49#J)7+D2@#!0$/C"9#LB^(@A5,"G7A7$V6GF@&N()`4@!2`$':H+K`!,+` M4%5(QQH]S-'9,M<[&^OIT^W&7ZEU/\(8-HX-Z.NIUO9"AU;2=A>*-WL/.<%? MX``8,.VB0DK788#I1>06$F`#G,XJ]Z,94=.EZ7?2`HA3G93.R,2&-X?6`&#[ MR6E)F\&L=.-I0T(J5$,;819:B2V9$J_0!*C[W%(":U4K9('(4$Q7H+ETX2+I M5LP$&=`$FI@':0(K@G\>GM+-<]#IA\WM9[:8,$M$VBY9JD7:EE+B$?H+K8AJ M7H23+M6CH9B[-,G.K)C;@5\04+N.RE0_ZSJG)4G3[?B;H4,8K-K(0<"`_^9S M\A._^-&@`(Q`K;[5N$,D5#15@0+*!120*_","T'34A.TF3R6I;.Z6DC(`6QH M')R\&T969[2:[&1*E$]UBQMR(]>G*76U_H)XBF??,PY+>TH+S<0[241`1\A$ M%D;H;"K'L)//^6"K/1#SSD_4/)^]=:,(\(<1$3AL*K[+4N\T+NSR8Q'-PP;^ M=5_CP]^(Z"7HP#0DE;PE<'M=Q/%TU%4G8]I"&R]L++D-?,944HJYX.3CX3ZG M+^J(5VR,G3Q;BL`1$`*L@0R82.$-`_S4)'\#9R$F1O1&P]*L@!)8&E>A*CC+ M:OYI$DF<5Q6(SA)V^Z@J M3W"D4)+<27B@NMLE&''WV"2:FN^&(NOT/+T%^'3A$Q#`^`MMR)$:5BNDG.#J8S<0=^ZOS:(3<84?$WHWY6U["FPJ MU)P;8!7&F%DU5J`Y$+9[`EIJ/ MB(K0VW4!K[H7P4O:6@`-/@_I"N.X!=XX"$"CA.'PN#`$5L8UN``QP92@@@M6`%$", M!,=TG^[RT_(*A9$45',VU)LE@]LR^DF#4-O'Y15`YZ@!I8Q+:^I1S=`LDB*G MA.A0/F:`MA$,UB*6W$QO1%PB/PP`@[$UMXR M]+S<0C$DW-V_YRO_\1ZJ`+?@5$VK2;&!$O`&)-<5*`%M``BP`913`T+`,*?9 M<]_L=%_F`B>O/$4KC7!!VE(1E[\:G2^C\*]U;'WRHI>QB.&_^T;<]5O.4?TG M@8=0@X4!"687H'9[P08%_]YS^-S"% M>JG5K$3PU5NQT_1"V,5)[HA39X38#D&?3'*N-%CWT4U#2A5WYA?0U!!M9$83 M[G#,U`#2Q'?%4LR`;5XIM7JL8#/"W;4B,`E#H+A")'Q2C$E!\QP4*_N3&"`D MJ`Q%@G)P:X1?.@*ED#%H!0,2'^)L7(=;75U M`25U=P%ZLIZK`54$"MA<]K31E5!%02P%&Y16%UFNT7Q8!"1`T]'K06DY(#WG M3G1>5$1Y]5'!$^R3]6404%MV$X\VW]4%C16_M)QY$;#$0&2%0'9KH,O$K+@C MV(AI91OP:%.>'('EG7L+$SW4<(0TG>"!5,=1$P3'U97!$5D!BW`6]`""!]K`A1Q&YA@QP3]`?,E!C5) MGX,5?22/4!,(#@TOST$W*!II%G2.79>ED%/:H)6`)?@(ZP9G<0\N"5$"3[<- MRCDDE!%`>FP,/48.T&B0$8Q$+)52>!19GP-&'(@6`1:G`!MX"F!>N/(1:GZX MPF)``%T&V`NEMA1`3*E>[<50%#!RBF4D2L01+&$XA-#L(HO!7B`_^0G%T2'X M3ZU_`]\>H2SY?B`(_-%H"`LUPM/$)M!,YU!ZUDH)!%``9A5LM3JNA3ZTX6@* M6%3%JRH?ZA%'/5Z&6:J`Y7D'^Q8'],PX![F$<%"O&"_#'2G01&\/_9Q(@AR7("7(E1$Z( MA&K(@S18JD5?48=L`<5%`*20^%T8QSQ#3#1G%0GC)1+!=MZ!:L*:D`$15S\D MB)$!,0P,PC18&=U><7%"Y#";CK\0]^E5JH+?M;PY!GY7*(%,]7B&G].1!10' M\8:`B)D,9VC!@:CI&#@+X@N4&"@D>LE?L$C@#3".R+8:P@5#@X-HF$&(;M&$ M^)BX/=.=A8@A2@P:HNXAO#T?(("'F,,HB$:/S:,G!$3A0EN`!N0(MA`/4@7@ M)L[7)!$B7BKY@8MX$6X<$2)P("-J!6Y/=V47LP/P)1L^?U7$6:`WR M`?&@)H@]<,+Z,QE@B?M(=Q#3L/SX`'*`$S`$X@!Q``^0`*$<20`;D`&9`S*$55(@+0PP@LB`,&6*X."#Z M&.1BC]C#?(C07[JX+K:+[^+6$02\'/(BO7@#/`'VXM)!W>6+^Z(2T"^Z.3W& MCF@@#HP_(KJH+K*+[B*\&`4TC*Z+$S`#(`%$@`;H_2$,,Z*62#$^#/'&PO`# MY``/`].2`P`!.<`)`#/^B]T>PX(%E`!10`G0`ZPL>PL1P`3\`'88"Z"W)`$E M@!6@R[T`(4`6L#,B`7@B[K`G(CY]HJNCQ[R,_*+$L##X`&7`#+!R+!D+@P6Q M,S:*6H'/"#0*C43C'$#]67_87]2"0Q4!W9_4R"\R+26`$E`"I(QA``\@!9R- M20D,P#5:C-#">U"@F1`H!%5VE]$'0(3O]90I'O>!C2%C2`PZ0C4S,?A=!Z`& MMO5Y9_Q)<<"5R#EQWRQ(:U`!1B)*L!"<`?]:0$"6G0Q.0&NPI12)$X'"MB*0 MCA.!KD(/Q7T$&5Q@Y4@,9!D].!I4"US#VA`*J0BA(QG1`EX^4T!=9!'@CMDB M>^"_/(ZN1.0H.;H8.*("D2&56>8,L_"768]+0V"6/091>MC$8(M)#'9$QE,M M=`R_196X>Q4GSB-;E1]"C@K`]$@]!@0:@14D\NT%+4#.P%20;$W`TA8Y:7#% M7<8@,;AG>8!I\3M6"T'`&U!`RF='P)@H'AD'`"3&D#KN1D>!4\!(4`M!P'I7 MQ?PS1:)18#J:.1.D%82)Y(_RPA!P%&`B%4.%$$"&D&A`';`&0`8D)+5@0KJ0 M:`!DD$)J;2QD'>"&Z(\R9`O)($20AY3$L$5\$13#$!`G?!+5!.:(!FT^W@U, MD2/L-TD>)#($E"LIB*_($<0;)\A--B]D;"\D@S#N+0L4'!`I0`8$`,($%P^: M!-1"$:#@"1`3'&2PWG601V(5R9YY,[A?@I`0"#!\!,4`7]5D1]EHH$2:8]H- M8D`Z>I`)`0@IP+B0XE$,22WLD>+1#;G!39#H"Q]9Q2B2^0$CV4`?M":=!)Y@='0"DY&HB2$L-D4!I8"6+/95-7!01( M@`$)2TH*4P!-1A)@*TO`OL!:A`N!P6)R/F@$AN(:,!*TDLF5(P.^30OL&9Q` MPYP/1UD7B4E*D,BD2+#>F9(\9%2R%KPT#,TT&40&!!7D*2DML&=OP7SUD!T% MWN08N4T@91J!O(`$1),:@;&03DZ0JA$%-R^\D[A"&%E(99(!03;1&PB3U4(2 M@(E\$L(D*]E/^D,G0\;F0E((;"3H0<&EBB]D&*8&O`&4HZTF0AZ0TH()Z4^> M"""D%TE,.I`2`R,3!EQP(UPVZ;.5%B3E/!E2@A$:@3AY4NX6^J0*24VB#!0" M@S`QU`L%@DI)4VJ/$X,38.40$[6=A2$HO5`\R!QI$H0EP`'JB#O4D!"9G8`R MY)$W)63P'/R1FB/',4@>B84DZAA2KDH40R_Y2JH1[]E/64W$?7J!B;0Q7)4? MI%:),A@3D(A)F1\L`6RE3MDKF#D.9"HC+<"5AN(#*5<:DZ5%X>-6FA@2"`1I M,<0;QB0BV0+PE!W&2%"3<91,B!L04G9-A02]$%?FDMN/8"E3?I.]`NBQE=@P M,T&'H5GR"E1EHV!5EHY9Y03911E3XI`2`\`!&"R6P*-8B;XQ(;GBY@,2C)>\HV_`%-A0\>7104P8D6X1#X+[ MR5F!6D,8^B2/K4A^68_9E/'E@BE7EA8+9@O`5:ISMXH#I2GZEV/E>U$HBHAH M96FI8+IGDZ3&9$&:!#[E/W-)DI6Q(X=)6IZ.$R2$"6+B>]1"B?EBSI.C)2') M8BJ8"F1)V6#FF`]F8!!N^)7QI1OP8_:8G\1A.2_XE)_$A;EV9)AEI8II8QJ2 MJB-([CH5,ITL%U0)5A,])959CE65&`V< MQ`2X?I(26\`[#C;<98QYJ_"856:-B57>F`'!&Z`/T9;H@CXD5_:1"Z2..2T\ M`55,H/EG[A8AYE_Y!"":H\$0<*+%EK_$F"AH4@N+IJ1Y:$J:LR2ER6@B$6V! M5F`"\#QO@JLX6$H,D685PTFZE95F%5-L.)3SU9\)LQ67&4>U,"`1!=?E9G,IEL5D924SR@BB9V!@$KJ56 M5!?,7Q%1\J,($9?B9"ZI>/B6%N=JHD.JFLCF42#!79+%IG9P%&"4!J?,F29( MFQX6Q3!@VAL:YUEYE$$&5U@+$)M0)/)"QF9HWAQ8Q#=QLC&9*:;#Z6$&!$[! M7WF=29-A"?-U*.0=A22UPD=Q(:9'>;I6$56F5>8V$/R;9&:9Z.I M80J+4*?84Y.AD(FGZ$D6K)*C)M0Y4'267*<:; M&EA">7C&4NCDZRE>/ITP@J>H:B7=( M;'F:@2GG[)Q:R`4U7Z$$H48!24E6`?/G"E9`RI7$QUY`MN8$68">8PBH`GJ.R96!9FGQ1DX+">CJ=BOTGHRG.ID' M:"4GPX!G3+J2H\'=*8%4$ZRD`M"?0(^^D?SH!0B1<\3),'X>CY2:%/`AF'8F M05"I.X9Q$D-Q0F9.#.!G7#1BWG/5A`XZ5,8%FN06@,Q$IAGGEG%&).Q MR>M3:$:;M:C2&4N@E7JG:"('[)H8F1TC+WR=B`$4P%16D7&$"8E,5)'F*`Z: M_,!U\MF\("4ZA./*$%I:)GG>@1JQ<<2>(X5<8*PD/Y3%%C$OG*%Y)7]B7AJC MDH()&1E`E+BEI,"*Q@5A&-Y@CUYAAHTZ<62F!SJ3>#1M"IZ1@4:9=LX2KV@; M2CY*FG+GM!"`OJ/R6:HX4M4!$D-8H8UBH$?9&T"""A!#I*YD^XT%XNAJ@D:$ MHFW$9[-QY!8_'DC`'M1E\UP+*C\.D-32O)!!WG@%*;.8CEH3RX+Q62T@HTCD MO?"(HJ.L5?:)&`2A]MJ.8/NE>N18?+@:0))U@!Z@!R@'\P)\Y9.5.-6.+K1C4!W\Y+XQ9.0(59'L,D>ID%4-Z;J2VY;>1 M[^D(:4(U@>M96%3I*`H@2`S"A)OP&/$E:0!N^5L^!+SDK9!/JIKDZ#T:$$`3 MNNC$$-M)!%(8<7 M'("NUXP:!AP%+4!Q`!&DIS>JM?`01*-Y!UCZH]8P;@!A-2W,&QGDDBIXU@:, M@G&@)Q*&Z]V,$#JZ$Q,DY)0&G*+IJ=93B>Z.,RJ8BD%2J(&!A2I`#`EHFM]% M,X"E.H)KBD$VE6"IDRJ63@$Q`?/8GLX!#.J=>BZ,J:3HC"H2P*=HZGS*G9H$ M9ZF"UWW>J5J#+T6.O:GTXVJPGQ:J:BKMB/JU`=OI6XJHQJ7DV`"$80Z0N<9^ M2@7$-2DD%?0X;1%:!,]C(ABH"&I`@'>8(":2O(#KP:H&@9XJ,>`=E%2UD*.B M.9BC3;:7JJJ^XH$:;S`*0^K$$*!0#FYJO&&>DGA3FISZHM*"0P`9DOX)D>]I M$1D*TAI!@"E4&`(4<5^^(*Q2JQ;=Q.#!A:O)*KBJ'>P<:ZD#-198!."G5M"M M!JNM*JG1&(BKMT*!($0NJN*J#\EIT@RJ*@O`JBJK:@=KHH]."X[FE2"PX@T" MR!H@1#)M6.A:.A^\!2;!I]FM%JC?:D!`2N$!*NI:ZATD!530EDH+_JM"9%DH M@TX!:P:@JHB6?!H!134O$`$I*U@@A]X(S-0+)C&0`:P'H`/7Z4,9INAH2JR9 M1,+.V!1GKD8F9&@=$P,V:&U0318!@`P=0HDE8(EHRE'S*X"PA+WR1T^;, M:K02$]5"T3ITY@56*^LY+P1M_2J\BD4VJ8!..]DZN@:.P-0!/BMT26-@+2)/$B; M`(-AX4"LQB*YN(IA82NK>DC*5ZEEM0!?4:Z2`M]Z$,17:8!H)Q)00=/"Y4K! M80'`P1O0L3X!JVK%NKEB!E7,)AJZ:I:MZVCP10J3P6BDH$G^!G&F)YFK+&RX M)OGI&-R>L8E@(068`,[#N*<9Q!NQ"1LZ,7R2RFOM>DP&!-:',8&U5@M'P$*` MB503ML=N9QUHDMS'ZB8OL&-E@O?:<5X>K:1RP1'.N&^D*!B?CK44"QQJL59"$HLC*3`2P% ME[WF>-"HN92D@I.@U&U*+;R3B\)H(+*U9S59*\E9'#IK`="UIFJO!@'P&K.F MK@0LN28'E%?,I`E+1U&.8L!W0#%PD39D_ZJ^CD,` MA/>N:J[)%C5+:Q=1QQ:NK>G`)+2BK9;E_5:B.JVP;&(G/B9VJ2PR ML5+69QMI&`#,VJ]SS"';7;IG*^41*W7Z;,3'F$C!*0$D04<94FYB]`D?)4JRL`K=)76Y6,W7ZUW8>;UY@=K9D[K M-M$![>5AR?L)L>A;L"F+O)%G:#[97OH&@RIZR=0*KD>J`_N$!02<;+KBU$XY M(D%[N7[:E=."R$`%_:8.@3)*U.8'(L-`6\#NJW5I7CJ']J:1J^H*)[`!XBQZ M:=1BC6CMX?K*MI=P7;@J+V@;89-CT,O2?VHL7GNLZ;$!7!)0>RB/>&V8.3%H M&V$F%&NO_9IM,M,9LC$+1B MCUOYUQ(/6.-\]68"1%(K\/@LSA)*P!C:6\H9JF,8RCK2B<%M[]$&S`P40V_K M&&@;QFDMZ09@(FD`[R@>_)_;Z'[;WFJTK./%!0)\FJS+>NJMQJNN1`MP-AV9 MDN:ML<"65^`8W]E[!`C$+2TK0#@%R^A=JSH"1,0M0/2,_;5:IBO!BERFONV: M^DG-MB0-[VA!QK=4P-I0C+::7$%QH.)"M`+!9?N[)E=!3EN+X.8:-"[QT.)^ ML/@;J0E0WB_"I$&[(SBV2T\F&Z]Z!SC"_=($<`NQ`Q6`:C!==:U!M#S*K'EF M&X`CV+*#II;KU[Z`3P*IR6C^,YULJGG<0A'=*G&:%ZRX\L+C)`*V!<2I:"*A M3@M0@)R[:[8%F`%4.<0.J&L"EI,96(G7*AIQJ,X%-JB`(!X!M=-IY=BP_IK? MPA/@9FRIRY1DBYK,F]8II4O,5A.FU:G^EF?/!QU+4';63;V:8&YBD:T!T]1N`".;EU1J?)CX"0VGY$ M[6V`L)<,K-,"R[JF6;BXW[`K0%`(]BPA:_+LL-9FMI`^=*I<"%S718"XWNP. M2L;V>T2!G=IR=KNW+'1;N%Z3>L#$8'*&&A+5[B; MO(5U00-[=;*M[&GRL]*:NV!N+(&W!J>?#8!0VFJ\%,D)EVY*#4X`MI`%6`O8 M@K9P<1FX!2H5D-!AI/L$/61MVD:^!D8J*!6*BJFT,%D\O;')CC#:\KD2KFI@ MDQ4(1ANRB0.I!0P"FM>+2JM5&LEBGV$+MSKW%`WUJ3H6RR>I'"O9^MVKH':H_+=DJ35"LZ&<<68%<8A4##\+WRAN<;UD*=A0E_RUZ4 MOE<"?Z0^EA8+)TVK[;(&PRGI2T3MMNQ%X&GDX@X,@2E!S(8(I<582SKB`;UO M8`MU>@4O[.$)R;Z(QL%?:WBV+Q,#3,N?[*Y,A&ZJ9:*=R=49Z_SBE,(D0JNZ M5I]N")DK>4J358D+P?W"6[4OU2DKJJJ&IVL0WDJ>ZJ\DN[9JJ2_8K/L6]+NF M+]M)_[:WY>3TFGCR'=AK*.N_LJ\+0JO)=;::I*XPZNS*"6-M[5OV/KBS13_[ M$30$U.P6J4?Q4B/ZGGFCN)N&%P6&K'U;P*Z!F-PMV\38`:<`>ZJ MK*NYY@%H@+=;+8B@="4&-W_=`;K"3'"J5C%ME,=*:SP!E,C#"QFPHVSP'IR4 M_C-PL-J[(]RUSB-,*5\/)<,Q`ROX2 M0"@\`AM`;LO#-`P68,/_FC&,#5M3VK"Q`#.4LD^J=M#X.@;['[LR$<)FYC`R MC$B]#.KP-8P,H\/_T,AKI#K#(-9!10X':=E!45@7D"2_2;AU#2=]EB^U(`0P MQ.95)E<<6`3.45GO,+3PSVW+9J)YC$ML%@`,9^Q->J*:') M;6*B`33Q)3`T2UB#])_:@91L/,L39XO]L(9!"[>E;\']]01B,LW%=="E03&HK6Z19443SQ:?9DP<%I_$ M(<)D`+JR/AX#*\P78S&D(Q8@&:\(2((*J3$@!N!9/'L3SPO5V6/<$]]$1KO"S)W!$Q0EE7L`L=%C7&],&6!G#MO%G(#-8 MOF>1-6L;EQ3A5])1IY@!ZX8=,*T"QU6>P&%H%0UYL67\'F$QTT.#TH#Y*DRI MLDNU^<;%,"@\J_@B&H:Q`0=E!K_0:L9Z0!K&03GC`AAMM;`K3"7L%AX!BT%9 M&L7SA5A1^DR$Y=QU[&$--O%';D$?(RO&,'SZQ"0*AWP^G:M-A5?YQ;?!\@;1G?EHZ8^:/";O MN9%LG(PT=`_&L$C)R,0,'H/-RA(3B;SQ'%-:"+6-#`5G;#$RKK"?;(RQ3:`O M7EP9P\F%\J]F'(B4B/(SN2C/$L=:H_RS_<+5,:7<%Y^4-$RC;,J!RI,RH3PJ M8PHUFE\\42U)KC"G.QK[BFT$,?$DC`;/%K.P&%\5!F#,2R"#PIPN'>R>RB^I M\L91*.]C88`*$"N_)'05JCPH&\NCLL1C#`?++@"5L"40#:&RJHS%)"I:Z;3, M+$\8>O#*6BR[Q%@,7XBG+LL/`:!,X5+'A`5D6*=DQ:A!L!PL_G$S@<@[4'EV M%_)T4IWYK-YRNKS9(#R4\;-,+GM84O*I4=R@RW1PNG(Z8LOC\F7L82FRT-H1 M@#"[`*K!=*<2-\QZ\A2PWXB`NG'-%2O+HX*%AK&0%H,@@FX,F\@F/,@RQPRFP9TT]!Q@+>INFZT"+W+$ MY6'D5'X2`75&J`#)PW"+/,[XS'("2FR#^,Q.:O'H&(#+MS.!?5JB'1V$2FAL*2S-4G*03`R+S3TSV6PE^P7N M1)2H8:FHCZS;7)WA1-X!\R`%>,U,<]AL#(_-YL$--K2>N-OHXLP@^,V$"M<\ M.*L(A3.53#=+!%`SS-!&X`4<;W[P!,P,X!#DO#7G1%TSY1PW-\V7,QN0.=/% M?_%$SX?PRY\X_,\9*JZQ-T\()L3TK M"*+S:D@Z3\Y?\Y1\/+_,AP)GMF"Q&!&RKN0\I03N\H2@D^UK"<)AYR,+#;-R MD:3AB'RV\_FL+OO#\H(4I$KM'&]R0,1^+%5UVK3%($AFZ#&,ES]3"U:"4>`* MES.7S?*A;!0-N1!*("''"985>C;TQ,5:0:'\&C`?UDA8D"^S&#B8,0R9'6#\ M&(M1/'+005T+QW.LS$TRVGR$[LPK=!LVG63+T#(6T\2LS(ZQ"BV8E$@TR&9S@-<'*)T'0,>5<`!2LW$$3"D#3%7/^6D(@;#`W&P6:Y70WMCS6KKO`OD0YSPQ?S M(:TG3Q);C?54J5FB7T5=E$HKPVO:9&Q(Z\7:LH<%2],'.X^(1$M#!)VR,:Q* M%]+LLBM=*`O39T0M3=MNQ4D'>51%YP=?L4)L3&O.JH_'(S&&-RE!`B^L*JX7:1P*V M)E<+:U]Z=OEH*;&-9Z`@15%PX%[(] M!35LO+R08;%I0FU1_]/VLD!=*$]#KK"$<("5KE/.@DO]C,I=E94C&JP%9E-- MT??$'F9<2-UIJB8,::2<'_#0`_/H\412-Q>6;[(SB]3X'BM--`0!1$)$>J1= MQ$$U2Y!_U%2^1<9!%<711NQ//5/;`;^P41D1=,9)FFT']JX(C=3RL7WY&H>) M3$TS/&1EP/5Z%TO*RO2H3%:;U6+U,2PF:QA>XI0V!N0!Z[$7+03H"'1U:S87 M?'VP0=CG5,?&)[2&L6[$06VUF]#X$@VRBAW5]NX?W8?B=0>TU0C!?%0>DU2; MG9ST3CP4T!.O*2P=9:'"9KT&6!A'V2@86!<-:`*9TU;_,Z=%,LU+]]`>%FK] MSK MH-@$QWV779-!2T]NS&$WU#1V.0P*B]2VXW.0&E?75[&'Q3ES)5YTCZWGF(BT M]>[\);72JW5._=]@IPC2@/MDLP$BP:;05L\!6"RQG%9C,0$"@7G=)$M7MAGW M8^/7A7(W!DTDV69VU%R`2;"BR*B<(#`&7$E+XQ3OV)UFFDU#B01+4'I6K@RI M0<"HG*[DV:K!GAUF7W44\Y"+0([*9>&7.AC3V31#Q81^(-:?A./10,D4G=LD M@65)8LYUITD'4-73B91[8X_*1<+YP7256/0=5=0U$=;&<)=]@'$4M"]:W61+ MV&N4*-%2L[O"-;OK"JO:P3*+H7`NG;L?L+@D658Y\0_2(UM#M_86P3-?R]RU MFC!5G!&1$X+-4!L28P%C<.CNS*HV,BTHO]I!=KWC*?\`"^"74=[EL2@ ML"CJ`@S6G3:07:K5=K*%(7'XVKK4=KMMF-014X1-+1!HA&M!36%9`9M*@F5E M#6D%?Q^8H@0UW"N"8<)NTPS6MLY9E7IVCXQ:O"0(W-VV0^5?@\6I=KL]8K(8 M&1M^#0]7I<6)L7A(/)E:95/:2HSJT>K,.3M:S4=3G;+)70O7.7R`]!PKUS*\/5JL@-PFY^S(,I9/9! MPR_-K7V@SLH1MIY,3T74W935=FN'%0G8###(,-G<]*@\=90U0D`0$`\I`7(W M=?=IA1J]56Q-;$?$3@6:0U'1QJDVW+TRD]Z] M=,C6+?R\?5G1W$8KHJTW#34&Z!Y:]JNM)Y\7\1`4,`3H.C6BE.T'"MZB]]Z[ M0V/,A?*!$CM89$>SZ(T';(O^G6J=>)O;N@0<$`\%=-2WW_UI'M]:[,Y<6>RT M&@;IF$172['!8E!#;#E)'E)P73$A6H$50$8)E4^8,5Q&Z-(L!IGU3LB:S%TXD-JA9^ MAJ!"UMWES!8''K\,"G_/=#2Z#2PPH4W`@@*"C],O@RL,@N/2ZS)H&R`\U/OW 9SG&"#[8JN$.P-L$,^/=)C<5XSS:X@D"#`P*" ` end From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 03:30:25 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA14074; Thu, 31 Jan 91 03:23:26 EST Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA14067; Thu, 31 Jan 91 03:23:23 EST Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 8056; Thu, 31 Jan 91 03:25:14 EST Received: from VM1.calc.ucl.ac.be by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 8055; Thu, 31 Jan 91 03:25:14 EST Received: by BUCLLN11 (Mailer R2.07) id 6051; Thu, 31 Jan 91 09:22:20 +0100 Date: Thu, 31 Jan 91 09:00:49 +0100 From: "Alain FONTAINE (Postmaster - NAD)" Message-Id: <910131.090049.+0100.af@sei.ucl.ac.be> Subject: Re: Character code sets and all that To: Peter Svanberg , ietf-smtp@dimacs.rutgers.edu In-Reply-To: Your message of Wed, 30 Jan 91 17:43:39 +0100 On this (EBCDIC) machine I got exactly the right sequence (encoded in EBCDIC, of course), except the fact that you interverted SO and SI in the first place. If you want to be really complete, you could also include ESC SPACE B , to announce that only G0 and G1 will be used, that SO and SI will do the switch, and that columns 10 to 15 will never be used. Now this is just written for the sake of picking nits 8-). I am neither advocating nor rejecting the use of this technique. /AF From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 07:00:27 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17189; Thu, 31 Jan 91 06:36:40 EST Received: from INFOODS.MIT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17182; Thu, 31 Jan 91 06:36:36 EST Date: Thu 31 Jan 91 06:35:57-EST From: John C Klensin Subject: Re: Re: Character code sets and all that To: af@sei.ucl.ac.be Cc: psv@nada.kth.se, ietf-smtp@dimacs.rutgers.edu Message-Id: <665321757.364937.KLENSIN@INFOODS.MIT.EDU> In-Reply-To: <910131.090049.+0100.af@sei.ucl.ac.be> Mail-System-Version: Alain, Thanks for diagnosing the problem, I received your message just before I started poking through the hex. That is, > except the fact that you interverted SO and SI in >the first place. Now, Peter (and everyone else, especially those contemplating embedded escapes, sending unannounced raw Latin-1 as 8 bit, etc.) I think there is an important, and almost conclusive, message here. I'm reading this message on a dumb terminal that understands, and displays, ISO8859-1 (Latin-1). My receiving SMTP does not bit-strip, nor does my UA. I think that makes me the ideal candidate for Robert's "just declare the 7bit implementations broken" theory, because, if you send me ISO8859-1, properly constructed, today, I just get it with no further problems. But Peter made what amounts to a one-character error; he shifted out when he should have shifted in. So, when he shifts back to G0 (ASCII) (or intends to), my little terminal shifts to G1. This is a little hard to read :-). And it stays there, through the end of the message, through whatever other messages I'm reading in a stream, and so on, until I manually reset the terminal. So we have accidentally applied Robert's suggestion, and the robustness principle, and had a one-character error, and something fairly nasty has occurred. This is not an uncommon problem: At the end of last year, a posting was made to a list in a mixed Latin/Cyrillic character set. To squeeze in a few extra characters, the designers of that character set used columns 8 and 9 for graphics (I am presuming Robert's theory, extended, would say "no reason to strip those, either"). And, lo and behold, one of those "graphics" turns out to be construed by these Standard-conforming terminals (X3.42, in this case) as "begin funny control sequence, don't do *anything* until you see another one of these characters". In other words, the keyboard locks, the screen locks, and the terminal imitates the brain-dead. Now, this problem can, in principle, be applied at either end. For example, one could, for the first time, start giving direction to UA writers that says, more or less, "the receiving SMTP can, and probably will, throw almost anything at you; you "SHOULD" have a sufficient understanding of your users and their terminals that you can filter out the terminal-killer and line-hanging-up sequences". To borrow from another recent discussion, so much for those who think that "cat" is an appropriate mail-reader. Interpreted in terms of the robustness principle, this implies a change to "be paranoid about what you accept". I suggest that leads to greater robustness. I also suggest that it is not desirable. More desirable, IMHO, is that sending SMTPs should announce what they intend to send, and should not tell lies about it. That *does* change the behavior of the sender, because, Robert, this line of reasoning says that, since RFC821 says "7 bit ASCII", your sending SMTP is broken if it permits anything but "7 bit ASCII" to go out, at least without the agreement of the receiver. And, in that view of the world, one of the things this discussion is about is how you negotiate with the receiver. That does suggest that your behavior changes if you say "I want to send Latin-1" and the receiver says "no", because, after you get the "no", if you send anything but ASCII, you are being at least slightly naughty. Note that, in the above paragraph, "7 bit ASCII" is redundant, only "ASCII" is necessary. And, despite my picking out the "just send 8 bits and modify the text" theory for special abuse, approximately the same argument applies to seven-bit-transport, character-panel-switching approaches such as Peter's example. I also observed, with interest, that, while my system isn't RFC1154-capable, Mark's embedded text is merely gibberish until I extract and process it. Unlike the small error in Peter's message, there is no damage to the environment. Maybe there is something to be learned from this. Let me, however, identify a bias in this situation. Robert points to the "8 bit telnet" transition as a success. Maybe it is. But, in the process of that transition, we lost something that I always considered very important and I guess I'm trying to prevent the same thing happening with electronic mail, where it is much more important. With the original, 7-bit-ASCII, NVT defintion, we had something very special, albeit something that one hoped would not often be necessary. It was a least-common-denominator character set, with no assumptions about (or permission to assume), e.g., the ability to understand and control the subtle behavior of the terminal at the client end from the server end. All of that stuff had to be negotiated, if you didn't negotiate it, the server telnet had to really assume glass ASCII TTY and act accordingly. This permitted any system to telnet to any other system and communicate, although sometimes in a highly stilted way. So that seven bit restriction was "fixed", and "transparent" was reinterpreted and applied differently. And I can now sit down and write a pair of HR-compliant telnet client and server over which, at the end-user sitting in front of a terminal level, there is no possible hope of interoperating between the two systems. (This is a separate discussion, so, if anyone wants to pursue it, let's take it off-list.) But I consider the loss of a guaranteed least common denominator a disadvantage. The problem with email is more serious, if only because the frequency with which one wants to telnet between systems so different as to turn the present telnet interpretation into a serious interoperability problem is pretty low. With email, we expect to send to widely disparate systems, without really knowing what they are. We have even made explicit provisions for handing email off to systems and networks that are not subject to *any* Internet requirements in a way that is intended to be transparent to the user. So I think we need to be pretty careful to confine people to minimal (but common) behavior until they agree to additional features, characters, or whatever. --john Klensin@INFOODS.MIT.EDU ------- From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 07:30:25 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17911; Thu, 31 Jan 91 07:24:00 EST Received: from INFOODS.MIT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17907; Thu, 31 Jan 91 07:23:58 EST Date: Thu 31 Jan 91 07:22:30-EST From: John C Klensin Subject: Re: Re: Character code sets and all that To: af@sei.ucl.ac.be Cc: Christopher.J.Tanner@nve.crl.aecl.ca, ietf-smtp@dimacs.rutgers.edu Message-Id: <665324550.827937.KLENSIN@INFOODS.MIT.EDU> In-Reply-To: <910130.153320.+0100.af@sei.ucl.ac.be> Mail-System-Version: Alain Fontaine writes, in response to one of my excessive assertions ( :-) )... >> (2) However, ISO2022 and its G0/G1 language don't, in my reading and >>understanding, really directly apply to ISO8859 character sets, but to >>the registered character sets (typically of no more than six columns >>each) from which the ISO8859 character sets are mostly built. >When this character set is used in the context of >other encoding standards like ISO 2022 or ISO 4873, one should consider >that it is made of the following elements: > - the space character 02/00 > - one G0 94-character set (02/01 to 07/14) > - one G1 96-character set (10/00 to 15/15). > >The text then gives the escape sequences to be used to designate those G0 >and G1 sets (i suppose it means they are registered). >So it seems that what we are talking about is explicitly permitted... My, I hate the way the ISO community writes Standards sometimes. And, as part of that community... :-( Part of this issue isn't going to be settled until someone asks for an official clarification. And that could take a year or so. Let me explain, in the context of my (probably unclear) original comment, what I think the above means. First, three facts that should not be in dispute: (i) The two 94-character sets that are identical to columns 2-7 and 10-15 of ISO8859-1 are registered character sets. (ii) The English-French-English translation process is not distorting what is going on here. (iii) There are no *registrations* of designation sequences for ISO88599-n sets. I don't think there is even a procedure for such registrations. So, you can't say "here comes ISO8859-1", you can only say "make this 94 (or 96) character set G0 (or G1, or GL, or GR)". Now, given this, I think that what the above is telling us is that, if, in ISO2022/4873 land, you designated these particular character sets onto those particular panels, you end up with something that is code point identical, and indistinguishable from, ISO8859-1. We can easily derive a "how to encode the Latin-1 character set in 7 bits" rule from the isomorphism, but, at that stage, we are technically using the G0 and G1 sets, and the registrations, not ISO8859-1, which is an integral 8bit character set. As I mentioned in the earlier note, we programming language standards folks really like ISO8859-n: All the characters are the same size, there are no multiple character overstrike sequences, etc. By contrast, arrangements in which the following are not equal.. number of storage-unit characters (bytes, octets, 16bit units,...) number of print positions consumed on page user perception of number of characters and spaces present .. create nightmares. I don't know to what degree similar concerns apply to email. Now, in my capacity as a member of the "we will succeed better if we keep this as simple as possible" community, I favor ISO8859-1 over ISO8859-n, and ISO8859-n over individual designation of arbitrary G0 and G1 sets. Other than the implications of one-character errors, using ISO2022 escapes on the two registered character sets as a means of representing ISO8859-1 in 7 bits does not disturb me at all, and disturbs me even less if we can assume 8 bit transport and do that escaping internally in, e.g., TOPS-20. But permitting designation of arbitrary registrations under the 2022 rules impresses me as open far worse cans of worms than ISO8859-n for arbitrary n. Perhaps a brief example would help. ISO8859-7 (I'm pretty sure; don't have it in front of me) is Latin/Greek. Its form is nearly identical to that of ISO8859-1 as cited by Alain, including, if I recall, listing the designator sequences to establish equivalency with the 2022/4873 models. But, if you permit the designators to be used in an arbitrary way, rather than staying in the 8859-n context, someone could come along and designated "Greek Alphabet Graphics" onto G0 and ASCII onto G1. That immediately reintroduces the "how do you spell "." in CRLF.CRLF?" problem. Various cliche's (whoops! need Latin-1 even for "English" :-)) come immediately to mind. john Klensin@INFOODS.MIT.EDU ------- From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 09:30:25 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA19616; Thu, 31 Jan 91 09:04:46 EST Received: from thumper.bellcore.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA19612; Thu, 31 Jan 91 09:04:44 EST Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id for ietf-smtp@dimacs.rutgers.edu; Thu, 31 Jan 91 09:04:41 EST Received: by greenbush.bellcore.com (4.12/4.7) id for ietf-smtp@dimacs.rutgers.edu; Thu, 31 Jan 91 09:07:20 est Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Thu, 31 Jan 1991 09:07:15 -0500 (EST) Message-Id: Date: Thu, 31 Jan 1991 09:07:15 -0500 (EST) From: Nathaniel Borenstein To: ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that In-Reply-To: <665321757.364937.KLENSIN@INFOODS.MIT.EDU> References: <665321757.364937.KLENSIN@INFOODS.MIT.EDU> Excerpts from internet.ietf-smtp: 31-Jan-91 Re: Re: Character code.. John C Klensin@infoods.m (6275) > I also observed, with interest, that, while my system isn't > RFC1154-capable, Mark's embedded text is merely gibberish until I > extract and process it. Unlike the small error in Peter's message, > there is no damage to the environment. Maybe there is something to be > learned from this. I hate to flog a dead horse, but if we're going to analyze Mark's message, I have to mention that his "27" lines of text were 28 by the time they reached me -- and there were only four hops. One of the maliers (either at Washington, 2 Rutgers hops, or Bellcore, unless someone didn't add a received header) managed to add an extra newline. This underscores the basic flaw with the Encoding header. Personally, I wouldn't want my multimedia mail users to be dependent on solving every such problem in every gateway in the world... -- Nathaniel From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 10:02:55 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA20764; Thu, 31 Jan 91 09:58:46 EST Received: from dkuug.dk by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA20760; Thu, 31 Jan 91 09:58:27 EST Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA01612; Thu, 31 Jan 91 15:56:11 +0100 Date: Thu, 31 Jan 91 15:56:11 +0100 From: Keld J|rn Simonsen Message-Id: <9101311456.AA01612@dkuug.dk> To: jmr@nada.kth.se, philipp@inf.enst.fr Subject: Re: Character code sets and all that Cc: ietf-smtp@dimacs.rutgers.edu X-Charset: ASCII X-Char-Esc: 29 > Philippe-Andre Prindeville writes: > > However (there's always a catch), I propose that (a) Latin 1 is not > > complete for Western European languages (it lacks certain dutch and > > danish characters), ... > > I have discussed this matter with a number of Dutch and Danish people, > who all work with computerized typesetting. They have all confirmed > to me that *** ISO Latin 1 does contain all characters needed for Dutch > and Danish ***. Well, Danish is indeed covered. I am quite sure that Dutch is covered too, according to Ducth authorities, which approved 8859-1. Also the French authorities approved 8859-1. But some Dutch and French characters are missing... The Dutch characters in question are and and the French ones and . Now the Dutch do not complain that much about it, but the French! They want to change 8859-1! As this was voted down in SC2 they proposed an new 8859 part with the French characters in it, and they are opposing to all new standards which are having 8859-1 as a base, including the new 10646. So this is a problem. Keld.Simonsen@dkuug.dk Postmaster Danish internet backbone From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 10:41:33 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA20901; Thu, 31 Jan 91 10:11:54 EST Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA20897; Thu, 31 Jan 91 10:11:45 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA13064; Thu, 31 Jan 91 16:11:17 +0100 Date: Thu, 31 Jan 91 16:11:17 +0100 From: Jan Michael Rynning To: John C Klensin Cc: IETF Internet Mail Extensions WG Subject: Re: Character code sets and all that In-Reply-To: Your message of Thu 31 Jan 91 05:33:17-EST Message-Id: This is partly a reply to a message I received from John C Klensin: > You can more easily confirm this from Sweden than I can from here, but > we have been told in UN and Nordic Council of Ministers projects that > the only *Western* European character that does not appear in ISO8859-1 > is the Islandic Eth. I'm surprised to hear that. Both the upper and lower case Icelandic eth appear in ISO 8859-1. The upper case Eth is in 13/00 and the lower case eth in 15/00. At least three people from Iceland have confirmed to me that ISO 8859-1 contains all characters needed for Icelandic. One of them is a linguist. However, there are some other, rather small, Western European languages, which are not catered for by ISO 8859-1. Lapp (also known as Same) is one of them. > Larger problems occur when one looks at Latin-alphabet-based *Eastern* > European languages since, due to, e.g., Slavic, influences, they have > acquired some characters more directly from Cryllic or Greek. I checked with people from Eastern Europe who I met at a TeX (computer- ized typesetting system) conference last year, and they confirmed to me that ISO 8859-2 has all the characters needed for the Eastern European languages listed in that standard. I don't have any first-hand information on the other Eastern European languages. From what I know, the Baltic languages use the Latin alpha- bet, with a lot of accented letters, but no non-Latin letters. Most of the languages used in the Soviet Union, Bulgarian, and some of the languages used in Yugoslavia, use the Cyrillic alphabet. Greek uses the Greek alphabet, of course. The only language I'm aware of which I think has mixed up the Latin alphabet with a few Cyrillic characters, is the Lapp dialect used in the Soviet Union, but there are probably more of them. The only major European language which uses the Latin alphabet, but is neither catered for by ISO 8859-1 (Latin 1, Western European languages) or ISO 8859-2 (Latin 2, Eastern European languages) is Turkish. The information that I have comes from two persons with Turkish parents. ISO 6937/2 has a table of what characters are needed for what language. That table has numerous errors. DON'T TRUST IT! Before this turns into an endless discussion of what characters are needed for what language, here's my CONCLUSION: Since electronic mail is used for Japanese and Chinese already, we can't confine ourselves to a Western European solution. We have the choise whether to use a multitude of character sets, or go for one which has it all. Partly depening on that decision, we'll need to standardize how to put such text into mail messages, how to identify the character set(s) to the receiver, etc. We may also decide to extend SMTP from 7 to 8-bit transmission. If we do, an SMTP sender which wishes to transmit 8-bit data, must first negotiate that with the SMTP receiver. For those that won't accept 8-bit data, we need a reasonable fallback. Converting the message to a format which may be passed through a 7-bit channel, without loss of information, is what I would prefer, for the sake of those who do not have a direct connection from the sender to the receiver. Con- verting the message to 7-bit ASCII by some sort of transcription, may make the message illegible even to a person who knows the language. From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 11:00:26 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21820; Thu, 31 Jan 91 10:44:12 EST Received: from dkuug.dk by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21795; Thu, 31 Jan 91 10:43:41 EST Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA03540; Thu, 31 Jan 91 16:40:26 +0100 Date: Thu, 31 Jan 91 16:40:26 +0100 From: Keld J|rn Simonsen Message-Id: <9101311540.AA03540@dkuug.dk> To: KLENSIN@infoods.mit.edu, af@sei.ucl.ac.be Subject: Re: Re: Character code sets and all that Cc: Christopher.J.Tanner@nve.crl.aecl.ca, ietf-smtp@dimacs.rutgers.edu X-Charset: ASCII X-Char-Esc: 29 > (iii) There are no *registrations* of designation sequences for > ISO88599-n sets. I don't think there is even a procedure for such > registrations. So, you can't say "here comes ISO8859-1", you can only > say "make this 94 (or 96) character set G0 (or G1, or GL, or GR)". ECMA, the registration authority for ISO 2022, has registered the various ISO 8859 parts with the following registration numbers: reg.nbr final standard 100 4/1 ISO 8859-1 101 4/2 ISO 8859-2 109 4/3 ISO 8859-3 110 4/4 ISO 8859-4 111 4/0 ECMA-113 (ISO 8859-5 cyrillic) 126 4/6 ISO 8859-7 greek 127 4/7 ISO 8859-6 arabic 138 4/8 ECMA-121 (ISO 8859-8 hebrew) and more. These can be designated as the G1 set and wit ASCII as the G0 set this make a complete 8859 part Keld Simonsen From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 11:30:26 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21807; Thu, 31 Jan 91 10:43:58 EST Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21800; Thu, 31 Jan 91 10:43:52 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA16171; Thu, 31 Jan 91 16:43:17 +0100 Date: Thu, 31 Jan 91 16:43:16 +0100 From: Jan Michael Rynning To: John C Klensin Cc: af@sei.ucl.ac.be, psv@nada.kth.se, ietf-smtp@dimacs.rutgers.edu Subject: Re: Re: Character code sets and all that In-Reply-To: Your message of Thu 31 Jan 91 06:35:57-EST Message-Id: John C Klensin writes: > Let me, however, identify a bias in this situation. Robert points to > the "8 bit telnet" transition as a success. Maybe it is. But, in the > process of that transition, we lost something that I always considered > very important and I guess I'm trying to prevent the same thing > happening with electronic mail, where it is much more important. Maybe it was a success. To some. It gave us some problems. We had terminals which always sent data with parity, even or odd, and a TOPS-20 TELNET which passed all 8 bits through, without negotiating. I found out the day we enabled 8-bit input on our UNIX machines. I solved the problem by patching the TOPS-20 TELNET program, so that it followed the protocol specification. I don't want to see similar problems with mail. Any extension which is a violation of the current specification must be negotiated between sender and receiver and agreed upon before it's used. From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 11:37:07 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21623; Thu, 31 Jan 91 10:37:07 EST Received: from INFOODS.MIT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21619; Thu, 31 Jan 91 10:37:03 EST Date: Thu 31 Jan 91 10:36:27-EST From: John C Klensin Subject: Re: Character code sets and all that To: jmr@nada.kth.se Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: <665336187.921937.KLENSIN@INFOODS.MIT.EDU> In-Reply-To: Mail-System-Version: >I'm surprised to hear that. Both the upper and lower case Icelandic eth >appear in ISO 8859-1. This is of course correct. I didn't have the thing in front of me and suffered an obvious temporary confusion, probably with something else. My apologies for spreading my confusion. >Before this turns into an endless discussion of what characters are >needed for what language, here's my CONCLUSION: > >Since electronic mail is used for Japanese and Chinese already, we can't >confine ourselves to a Western European solution. We have the choise >whether to use a multitude of character sets, or go for one which has it >all. I think this statement represents the result of a sequence of policy decisions, not a logical consequence. It is a reasonable model and way to proceed. Given a different set of assumptions and policy decisions, one could conclude that Latin-1 is adequate for cleartext, and everything else gets embedded or encoded somehow, or left for an X.400 transport. Given yet a different set of assumptions and policy decisions, a distinction is made between "alphabetic" character-symbols and word-symbols, and we adopt a way to handle the ISO8859-n family (which could include kana as well as lots of other non-European character sets), but embed or encode, or leave for X.400, everything else. Greg? Want to supply some guidance on how these sorts of non-technical decisions, about what we "can" and "cannot" confine ourselves to get made. For the record, and given the assumptions, I agree with the rest of Jan Michael's conclusion(s). john ------- From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 12:00:26 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA23384; Thu, 31 Jan 91 11:33:50 EST Received: from corton.inria.fr by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA23373; Thu, 31 Jan 91 11:33:46 EST Received: from [192.44.64.233] by corton.inria.fr (5.65+/90.0.9) via Fnet-EUnet id AA15949; Thu, 31 Jan 91 17:33:39 +0100 (MET) Received: from ulysse.enst.fr (inf) by enst.enst.fr (4.1/SMI-4.0) id AA13306; Thu, 31 Jan 91 17:33:15 +0100 Received: from helios.enst.fr by ulysse.enst.fr (4.1/SMI-4.0-MHS-6.0) id AA22677; Thu, 31 Jan 91 17:33:13 +0100 Date: Thu, 31 Jan 91 17:33:13 +0100 From: philipp@inf.enst.fr (Philippe-Andre Prindeville) Message-Id: <9101311633.AA22677@ulysse.enst.fr> X-Network: Fnet-Eunet X-Organization: Telecom Paris (Ecole Nationale Superieure des Telecoms) X-Address: 46, rue Barrault - 75634 Paris cedex 13 - FRANCE X-Content-Type: ISO-FONT; 8859.1 (This message may contain ISO chars) To: ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that For those of you that would like to ignore 7-bit transport agents, I would just like to hark "be liberal in what you accept, conservative in what you send". A great Internet architect once said that (and repeats it at each IETF meeting)... -Philip From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 12:06:51 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA22661; Thu, 31 Jan 91 11:05:34 EST Received: from INFOODS.MIT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA22657; Thu, 31 Jan 91 11:05:32 EST Date: Thu 31 Jan 91 11:02:58-EST From: John C Klensin Subject: Re: ISO2022, Registrations, and ISO8859-n To: keld@dkuug.dk Cc: af@sei.ucl.ac.be, Christopher.J.Tanner@nve.crl.aecl.ca, ietf-smtp@dimacs.rutgers.edu Message-Id: <665337778.247937.KLENSIN@INFOODS.MIT.EDU> In-Reply-To: <9101311540.AA03540@dkuug.dk> Mail-System-Version: Keld Simonsen writes... >ECMA, the registration authority for ISO 2022, has registered >the various ISO 8859 parts with the following registration numbers: > (list omitted) >and more. These can be designated as the G1 set and wit ASCII as the >G0 set this make a complete 8859 part Sorry, I still have not been clear. All I'm trying to say is that designating one of these as G1, doesn't make an 8859 part, one has to somehow know, externally, that ASCII is used as G0. Similarly, designating one of the registered sets as G1 and explicitly designating ASCII as G0 *still* does not make an 8859 part. First of all, there are several registered charater sets that cannot be combined with ASCII to make an 8859 part (because 8859-n sections have to be voted on as "International Standards" and these things merely require someone to come in with the appropriate forms to register them). Second, there is no algorithm to map between registration number of designation sequence and an ISO8859 part number. So an SMTP extension standard would need one or more rules *in addition to* ISO2022 and the designation sequences in order to use the ISO2022 *framework* with ISO8859-n character sets. One such rule, and the one implied by Keld's note, would be "Use the designator for G1 of the registered character set that is identical to the GR part of an ISO8859 part to identify that ISO8859 part. Designators of registered character sets that don't appear as GR in ISO8859 parts are prohibited". But that would be an *SMTP* rule, however reasonable and obvious, not an ISO2022 rule. To the best of my recollection (which is not very good today) ISO2022 has *no* mechanism for "designating" a complete eight bit character set (e.g., any of ISO8859-?) except by that type of circumvention, which is not part of the standard. Now there is also a plausible ISO2022 rule, which would be, more or less, use ISO2022 to designate any registered character set, onto G0 *or* G1, according to its registration and registered designators. Unless additional constraints are placed on its use (again, outside ISO2022), this rule does not guarantee ASCII in G0 (or GL). The distinction is important because the second opens up major cans of worms that the first does not. And I think it is very important that we all understand what each other are talking about, if possible. john ------- From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 12:32:46 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA23820; Thu, 31 Jan 91 11:50:02 EST Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA23813; Thu, 31 Jan 91 11:49:50 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA19319; Thu, 31 Jan 91 17:45:57 +0100 Date: Thu, 31 Jan 91 17:45:56 +0100 From: Jan Michael Rynning To: Keld J|rn Simonsen Cc: jmr@nada.kth.se, philipp@inf.enst.fr, ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that In-Reply-To: Your message of Thu, 31 Jan 91 15:56:11 +0100 Message-Id: Keld J|rn Simonsen ("|" is an "o" with a slash across it in the Danish 7-bit character set) writes: > ... But some Dutch and French characters are missing... The Dutch > characters in question are and and the French ones > and . Now the Dutch do not complain that much about it, but the > French! "IJ"/"ij" is a pair of characters which is given special treatment in Dutch. If a Dutch word starts with "ij", and the "I" is uppercased, the "J" should also be uppercased, regardless of whether the rest of the word is in upper or lower case. If a Dutch word contains an "ij", and the word is letterspaced, there should be no space between the "i" and the "j". You don't need a special character to do such tricks. Some French people claim that they need the "OE"/"oe" ligature to write their language. Others say that it isn't used any more. They seem to have a similar controversy over whether to put accents on top of upper- case letters, or leave them out. This is a red herring. From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 12:30:26 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA23131; Thu, 31 Jan 91 11:25:10 EST Received: from dkuug.dk by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA23126; Thu, 31 Jan 91 11:24:59 EST Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA05117; Thu, 31 Jan 91 17:23:20 +0100 Date: Thu, 31 Jan 91 17:23:20 +0100 From: Keld J|rn Simonsen Message-Id: <9101311623.AA05117@dkuug.dk> To: KLENSIN@infoods.mit.edu, jmr@nada.kth.se Subject: Re: Character code sets and all that Cc: ietf-smtp@dimacs.rutgers.edu X-Charset: ASCII X-Char-Esc: 29 > Before this turns into an endless discussion of what characters are > needed for what language, here's my CONCLUSION: > > Since electronic mail is used for Japanese and Chinese already, we can't > confine ourselves to a Western European solution. We have the choise > whether to use a multitude of character sets, or go for one which has it > all. Partly depening on that decision, we'll need to standardize how to > put such text into mail messages, how to identify the character set(s) > to the receiver, etc. We may also decide to extend SMTP from 7 to 8-bit > transmission. If we do, an SMTP sender which wishes to transmit 8-bit > data, must first negotiate that with the SMTP receiver. For those that > won't accept 8-bit data, we need a reasonable fallback. Converting the > message to a format which may be passed through a 7-bit channel, without > loss of information, is what I would prefer, for the sake of those who > do not have a direct connection from the sender to the receiver. Con- > verting the message to 7-bit ASCII by some sort of transcription, may > make the message illegible even to a person who knows the language. I have done an implementation of an email system, that meets many of these goals. It supports about 90 7- and 8-bit character sets, including 8859, 6937, IBM PC, HP, DEC, MAC, 20 EBCDIC character sets. Almost all of the ECMA registrature is covered. 16 bit support is planned. It has a quite readable fallback to ASCII, where each non-representable character given as a mnemonic - often by transcription - representation. I have written an article on this, which I can post to the list. The software has been running here for about a year, it is implemented in sendmail 5.61 and 5.64, and UAs are being modified for it. It is also employed in an X.400 implementation. Keld Simonsen From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 13:02:00 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA23694; Thu, 31 Jan 91 11:45:02 EST Received: from corton.inria.fr by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA23687; Thu, 31 Jan 91 11:44:58 EST Received: from [192.44.64.233] by corton.inria.fr (5.65+/90.0.9) via Fnet-EUnet id AA16260; Thu, 31 Jan 91 17:44:52 +0100 (MET) Received: from ulysse.enst.fr (inf) by enst.enst.fr (4.1/SMI-4.0) id AA13328; Thu, 31 Jan 91 17:44:32 +0100 Received: from helios.enst.fr by ulysse.enst.fr (4.1/SMI-4.0-MHS-6.0) id AA22704; Thu, 31 Jan 91 17:44:30 +0100 Date: Thu, 31 Jan 91 17:44:29 +0100 From: philipp@inf.enst.fr (Philippe-Andre Prindeville) Message-Id: <9101311644.AA22704@ulysse.enst.fr> X-Network: Fnet-Eunet X-Organization: Telecom Paris (Ecole Nationale Superieure des Telecoms) X-Address: 46, rue Barrault - 75634 Paris cedex 13 - FRANCE X-Content-Type: ISO-FONT; 8859.1 (This message may contain ISO chars) To: ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that Well, Danish is indeed covered. I am quite sure that Dutch is covered too, according to Ducth authorities, which approved 8859-1. Also the French authorities approved 8859-1. But some Dutch and French characters are missing... The Dutch characters in question are and and the French ones and . Now the Dutch do not complain that much about it, but the French! They want to change 8859-1! As this was voted down in SC2 they proposed an new 8859 part with the French characters in it, and they are opposing to all new standards which are having 8859-1 as a base, including the new 10646. So this is a problem. Well, admittedly any sort of ligatures are not vital. It is mostly for aesthetical reasons that one might include them. IJsmeer and &IJsmeer are not so different. There are a couple of characters that (old) Dutch used but aren't so frequent in new Dutch, but are still used in Vlaams (Flemish or Flammand to the rest of you ;-). I don't have my van Dale's (sigh) handy, so I can't tell you the names of them... Anyone who proposes Vlaams as a language onto itself will probably get laughed at, though. Like proposing Quebe&c,ois to be distinct from French (though L'Academie Fran&c,aise would have you believe it). I must admit that I really miss the &ij character though. It brings pleasant memories of stroopwafels and kopstodts, and bicycling along the Prinsengracht... Apropos of &OE and &oe, these are probably more common in Dutch than in French. M&oe&i:, P&oel, K&oening(k), etc. -Philip From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 13:00:27 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA23944; Thu, 31 Jan 91 11:53:09 EST Received: from corton.inria.fr by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA23940; Thu, 31 Jan 91 11:53:04 EST Received: from [192.44.64.233] by corton.inria.fr (5.65+/90.0.9) via Fnet-EUnet id AA16428; Thu, 31 Jan 91 17:52:54 +0100 (MET) Received: from ulysse.enst.fr (inf) by enst.enst.fr (4.1/SMI-4.0) id AA13352; Thu, 31 Jan 91 17:52:33 +0100 Received: from helios.enst.fr by ulysse.enst.fr (4.1/SMI-4.0-MHS-6.0) id AA22827; Thu, 31 Jan 91 17:52:26 +0100 Date: Thu, 31 Jan 91 17:52:25 +0100 From: philipp@inf.enst.fr (Philippe-Andre Prindeville) Message-Id: <9101311652.AA22827@ulysse.enst.fr> X-Network: Fnet-Eunet X-Organization: Telecom Paris (Ecole Nationale Superieure des Telecoms) X-Address: 46, rue Barrault - 75634 Paris cedex 13 - FRANCE X-Content-Type: ISO-FONT; 8859.1 (This message may contain ISO chars) To: Subject: Re: Character code sets and all that The only major European language which uses the Latin alphabet, but is neither catered for by ISO 8859-1 (Latin 1, Western European languages) or ISO 8859-2 (Latin 2, Eastern European languages) is Turkish. The information that I have comes from two persons with Turkish parents. Ah, I was wondering when we would get around to this... a vile can of worms. Like vietnamese, Turkish has some grueling diacriticals. Before this turns into an endless discussion of what characters are needed for what language, here's my CONCLUSION: Too late... Bref: Does anyone know if UNICODE (using split encoding) is 7-bit "safe"? -Philip From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 13:19:36 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24559; Thu, 31 Jan 91 12:14:29 EST Received: from corton.inria.fr by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24551; Thu, 31 Jan 91 12:14:23 EST Received: from [192.44.64.233] by corton.inria.fr (5.65+/90.0.9) via Fnet-EUnet id AA16775; Thu, 31 Jan 91 18:14:13 +0100 (MET) Received: from ulysse.enst.fr (inf) by enst.enst.fr (4.1/SMI-4.0) id AA13436; Thu, 31 Jan 91 18:13:48 +0100 Received: from helios.enst.fr by ulysse.enst.fr (4.1/SMI-4.0-MHS-6.0) id AA23121; Thu, 31 Jan 91 18:13:46 +0100 Date: Thu, 31 Jan 91 18:13:46 +0100 From: philipp@inf.enst.fr (Philippe-Andre Prindeville) Message-Id: <9101311713.AA23121@ulysse.enst.fr> X-Network: Fnet-Eunet X-Organization: Telecom Paris (Ecole Nationale Superieure des Telecoms) X-Address: 46, rue Barrault - 75634 Paris cedex 13 - FRANCE X-Content-Type: ISO-FONT; 8859.1 (This message may contain ISO chars) To: Subject: Re: Character code sets and all that Cc: Mark Shand L'Academie Francaise and the Minisitry of Culture have been debating lately whether to reform accents in French. For instance, the tradition of using a circumflex to indicate an "s" dropped from the old spelling (e.g. "ba^tard" [bastard] and "ho^tel" [hostel]) is one of the hot issues. Still, many people will say that when you restrict a language, a great deal is lost. Linguists are still debatting about whether Mao-Tse Tung did a good thing to chinese or not: yes, many more people can read or write than before the cultural revolution, but how many can understand the poems of Li-Po^? Much subtlety has been sacrificed. Many of these suppressions (such as dropping OE) have been caused by technology forcing compromises. If most of France uses Selectric typewriters or Minitel 1A terminals, and they are limited to 96 characters (or touches, or whatever), OE/oe probably doesn't get much priority over a` and e` (a` is very different from a in a sentence and de`s is different from des, as are du^ and du, etc.). -Philip From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 13:32:19 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA26356; Thu, 31 Jan 91 13:12:17 EST Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA26352; Thu, 31 Jan 91 13:12:15 EST Message-Id: <9101311812.AA26352@dimacs.rutgers.edu> Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 9781; Thu, 31 Jan 91 13:14:09 EST Received: from BLIULG11 by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 9780; Thu, 31 Jan 91 13:14:08 EST Received: from vm1.ulg.ac.be by BLIULG11 (Mailer R2.07) with BSMTP id 0147; Thu, 31 Jan 91 18:23:15 +0100 Date: Thu, 31 Jan 91 18:15:26 +0100 From: Andr'e PIRARD Subject: Re: Character code sets and all that To: Keld J|rn Simonsen , jmr@nada.kth.se, philipp@inf.enst.fr Cc: ietf-smtp@dimacs.rutgers.edu In-Reply-To: Message of Thu, 31 Jan 91 15:56:11 +0100 from On Thu, 31 Jan 91 15:56:11 +0100 you said: >Well, Danish is indeed covered. I am quite sure that Dutch is covered >too, according to Ducth authorities, which approved 8859-1. >Also the French authorities approved 8859-1. But some Dutch and French >characters are missing... The Dutch characters in question are and > and the French ones and . Now the Dutch do not >complain that much about it, but the French! They want to change 8859-1! >As this was voted down in SC2 they proposed an new 8859 part with the >French characters in it, and they are opposing to all new standards >which are having 8859-1 as a base, including the new 10646. >So this is a problem. I have it from a Dutch character codes specialist, Johan van Wingen to be met on lists ISO8859 and ISO10646, that these Dutch characters are officially withdrawn from the Dutch repertoire. I have it from our secretaries that typing 's and the kind on a Mac needs the keyboard accessory (aid to find hidden layout) to be found. French also has and even ligatures... Andr'e. From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 14:00:26 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA27518; Thu, 31 Jan 91 13:52:45 EST Received: from TWG.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA27506; Thu, 31 Jan 91 13:52:21 EST Received: from Obelix.twg.com by twg.com with SMTP ; Thu, 31 Jan 91 10:42:19 PST Date: Thu, 31 Jan 91 10:42:00 PST From: "Edward C. Bennett" To: ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that Message-Id: <9101311042.aa21022@Obelix.TWG.COM> In his letter dated Thu, 31 Jan 1991 09:07:15 -0500 (EST), Nathaniel Borenstein wrote: > >I hate to flog a dead horse, but if we're going to analyze Mark's >message, I have to mention that his "27" lines of text were 28 by the >time they reached me -- and there were only four hops.... >This underscores the basic flaw with the Encoding header. More to the point, it undersores the basic problem with RFC1154, i.e. it's ambiguous. In particular: In the Introduction it says body parts are separated by "an apparently blank line." What does "apparently blank" mean? Whitespace? Why not just say "a null line" like RFC822 does? (RFC1154 also erroneously claims that the header/body separator in RFC822 is "an apparently blank line." RFC822 is quite specific when it says "[the body] is separated from the headers by a null line (i.e., a line with nothing preceding the CRLF).") RFC1154 says nothing about what to do when the line count in the Encoding header doesn't match the line count in the body. Do you blindly trust the header and just split the body by those counts? What if the anticipated "apparently blank" line has text in it? I, for one, think that before RFC1154 can be *really* usable we need to clean out the ambiguities and get something more specific. Ed -- Edward C. Bennett - The other MMDF guy edward@twg.com The Wollongong Group (415) 962-7252 1129 San Antonio Road, Palo Alto, CA 94303 "He's become a growling, snarling mass of white-hot canine terror" From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 14:30:27 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA27883; Thu, 31 Jan 91 13:59:10 EST Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA27876; Thu, 31 Jan 91 13:59:03 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA24064; Thu, 31 Jan 91 19:58:54 +0100 Date: Thu, 31 Jan 91 19:58:53 +0100 From: Jan Michael Rynning To: IETF Internet Mail Extensions WG Subject: 8-bit text is not binary Message-Id: There seems to be some confusion about the difference between 8-bit text and binary. 8-bit text is not binary. 8-bit text consists of lines of printable characters. Binary usually doesn't. Sending binary data is not only a matter of allowing 8-bit SMTP trans- missions and removing the limitations on line length, as the following example demonstrates. Sender on UNIX Network transmission Receiver on VMS Text: 10 (UNIX eol) -> 13 10 (Network eol) -> 2-byte record length Binary: 10 -> (possibly encoded) -> 10 If a person on a UNIX system sends a text file containing a 10 (LF, the UNIX end of line), it should first be converted to 13 10 (CR LF, the Network end of line used in mail messages) when sent over the Network, and finally converted to a 2-byte record length preceeding the line, when the person on the VMS system receives it. On the other hand, if the sender on the UNIX system sends binary data, with a byte containing the value of 10, it should end up as binary data, with a byte containing the value 10, on the VMS system. In order to handle this difference properly, the sender's UA must know if it's text or binary. Allowing arbitrary 8-bit values in messages and unlimited line lengths won't remove that requirement. So, the sender's UA might as well LZJU90-encode (or whatever) the binary data, and tell the receiver's UA what it has done, thus avoiding the trouble we would have if we tried to send the binary data in raw format. From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 15:00:26 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28664; Thu, 31 Jan 91 14:35:40 EST Received: from corton.inria.fr by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28660; Thu, 31 Jan 91 14:35:35 EST Received: from [192.44.64.233] by corton.inria.fr (5.65+/90.0.9) via Fnet-EUnet id AA24787; Thu, 31 Jan 91 20:34:31 +0100 (MET) Received: from ulysse.enst.fr (inf) by enst.enst.fr (4.1/SMI-4.0) id AA14034; Thu, 31 Jan 91 20:34:09 +0100 Received: from helios.enst.fr by ulysse.enst.fr (4.1/SMI-4.0-MHS-6.0) id AA24929; Thu, 31 Jan 91 20:34:08 +0100 Date: Thu, 31 Jan 91 20:34:08 +0100 From: philipp@inf.enst.fr (Philippe-Andre Prindeville) Message-Id: <9101311934.AA24929@ulysse.enst.fr> X-Network: Fnet-Eunet X-Organization: Telecom Paris (Ecole Nationale Superieure des Telecoms) X-Address: 46, rue Barrault - 75634 Paris cedex 13 - FRANCE X-Content-Type: ISO-FONT; 8859.1 (This message may contain ISO chars) To: edward@twg.com, ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that Also to the point, many BITNET mailers (at least the Crosswell) do not send a "null" line, but rather a line with 8 spaces, because RSCS doesn't like empty records. Some mailers do translate that, and others don't. And some mailer readers correctly interpret such a line followed by an indented line (as in a paragraph start) as a second continuation line, ie. an extension of the header. All of which further complicates things... -Philip From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 15:31:43 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA29767; Thu, 31 Jan 91 15:12:00 EST Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA29751; Thu, 31 Jan 91 15:11:44 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA25961; Thu, 31 Jan 91 21:11:21 +0100 Date: Thu, 31 Jan 91 21:11:20 +0100 From: Jan Michael Rynning To: philipp@inf.enst.fr (Philippe-Andre Prindeville) Cc: Subject: Re: Character code sets and all that In-Reply-To: Your message of Thu, 31 Jan 91 17:52:25 +0100 Message-Id: Philippe-Andre Prindeville asks: > Bref: Does anyone know if UNICODE (using split encoding) is 7-bit "safe"? It's not "7-bit safe". If you split a 16-bit Unicode character element into two octets (most and least significant 8 bits), each of them can have any value between 0 and 255. For the Latin 1 part of the Unicode character set, the 8 most significant bits are 0. Have you ever tried to send a NUL through UNIX sendmail? From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 16:01:41 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA29881; Thu, 31 Jan 91 15:14:01 EST Received: from thumper.bellcore.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA29877; Thu, 31 Jan 91 15:13:58 EST Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id for ietf-smtp@dimacs.rutgers.edu; Thu, 31 Jan 91 15:13:54 EST Received: by greenbush.bellcore.com (4.12/4.7) id for ietf-smtp@dimacs.rutgers.edu; Thu, 31 Jan 91 15:16:28 est Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Thu, 31 Jan 1991 15:16:24 -0500 (EST) Message-Id: <4be7gMK0M2YtQ69pdh@thumper.bellcore.com> Date: Thu, 31 Jan 1991 15:16:24 -0500 (EST) From: Nathaniel Borenstein To: ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that In-Reply-To: <9101311042.aa21022@Obelix.TWG.COM> References: <9101311042.aa21022@Obelix.TWG.COM> Excerpts from internet.ietf-smtp: 31-Jan-91 Re: Character code sets an.. "Edward C. Bennett"@twg. (1500) > More to the point, it undersores the basic problem with RFC1154, i.e. > it's ambiguous. Yes, that's exactly right. Under the right definition of "apparently blank line" I guess what I got was indeed 27 lines of text. But how can I tell the difference, especially if the last line of text was blank or the first line of the next body part is blank? It does indeed seem fraught with peril... From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 16:03:56 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00230; Thu, 31 Jan 91 15:25:06 EST Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00224; Thu, 31 Jan 91 15:25:02 EST Message-Id: <9101312025.AA00224@dimacs.rutgers.edu> Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 1352; Thu, 31 Jan 91 15:26:41 EST Received: from BLIULG11 by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 1351; Thu, 31 Jan 91 15:26:41 EST Received: from vm1.ulg.ac.be by BLIULG11 (Mailer R2.07) with BSMTP id 0271; Thu, 31 Jan 91 21:24:09 +0100 Date: Thu, 31 Jan 91 21:15:55 +0100 From: Andr'e PIRARD Subject: Re: Character code sets and all that To: Philippe-Andre Prindeville , edward@twg.com, ietf-smtp@dimacs.rutgers.edu In-Reply-To: Message of Thu, 31 Jan 91 20:34:08 +0100 from On Thu, 31 Jan 91 20:34:08 +0100 you said: >Also to the point, many BITNET mailers (at least the Crosswell) >do not send a "null" line, but rather a line with 8 spaces, because >RSCS doesn't like empty records. It's their CMS file systems actually. Nor does it like empty files. Andr'e. From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 16:28:33 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00324; Thu, 31 Jan 91 15:27:32 EST Received: from ux1.cso.uiuc.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00298; Thu, 31 Jan 91 15:26:29 EST Received: from mp.cs.niu.edu by ux1.cso.uiuc.edu with SMTP id AA27049 (5.65a/IDA-1.4.2.3 for ietf-smtp@dimacs.rutgers.edu); Thu, 31 Jan 91 14:27:15 -0600 Received: by mp.cs.niu.edu id AA06960 (5.65a/IDA-1.4.2.7 for info-ietf-smtp@ux1.cso.uiuc.edu); Thu, 31 Jan 91 14:23:50 -0600 Newsgroups: info.ietf.smtp Path: rickert From: rickert@cs.niu.edu (Neil Rickert) Subject: Re: Character code sets and all that Message-Id: <1991Jan31.202340.27737@mp.cs.niu.edu> Organization: Northern Illinois University References: <9101311042.aa21022@Obelix.TWG.COM> Distribution: info Date: Thu, 31 Jan 1991 20:23:40 GMT Lines: 30 Apparently-To: info-ietf-smtp@ux1.cso.uiuc.edu In article <9101311042.aa21022@Obelix.TWG.COM> edward@twg.com ("Edward C. Bennett") writes: > >More to the point, it undersores the basic problem with RFC1154, i.e. >it's ambiguous. In particular: > > In the Introduction it says body parts are separated by "an > apparently blank line." What does "apparently blank" mean? > Whitespace? Why not just say "a null line" like RFC822 does? No! No! No! You can't do that. Need I remind you that there actually are systems out there which are not unix. In some of these systems a line is determined by something other than a LF or CRLF at the end. The meaning of line in some systems is such that every line must contain at least one character, typically a blank (for text lines) or a binary 0 (for binary lines) when the line is to be treated as empty. RFC1154 is that way specifically to allow for a large number of variations as to what is a line, and still be recognized. The ambiguities are not in RFC1154. It is merely reflecting the ambiguities in the understanding of the meaning of line over a broad range of systems. -- =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= Neil W. Rickert, Computer Science Northern Illinois Univ. DeKalb, IL 60115 +1-815-753-6940 From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 16:32:16 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00822; Thu, 31 Jan 91 15:45:37 EST Received: from NRI.RESTON.VA.US by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00818; Thu, 31 Jan 91 15:45:34 EST Received: from NRI by NRI.NRI.Reston.VA.US id aa10831; 31 Jan 91 15:41 EST Org: Corp. for National Research Initiatives Phone: (703) 620-8990 ; Fax: (703) 620-0913 To: IETF Internet Mail Extensions WG Cc: gvaudre@nri.reston.va.us Subject: The Next Step Date: Thu, 31 Jan 91 15:41:45 -0500 From: Greg Vaudreuil Message-Id: <9101311541.aa10831@NRI.NRI.Reston.VA.US> Folks, In this message I will attempt to distill the discussions on this list, and focus them a bit. 1) I would like to follow up on Jan Michael Rynning's comment that 8bit/no line length restrictions are not binary. It is becoming clear that one possible goal or sending binary via SMTP is not as simple as it at first appeared to many of us. Besides the CRLF translations, the issues of implementations being written assuming the line length present real obstacles to deploying modified mailers. We may wish to define a second new transmission mode, BINARY, to open the door to multi-media mail. At this point, someone needs to convince me and this list that this is worth the rather extensive overhaul of the system. For me, this is becoming the 821/822 <=> X.400 breaking point. In all circumstances, any change to the standards that is a violation of current specifications, must be negotiated. This may be by version number, or by explicit feature negotiation. This is just plain good sense, and is likely to be enforced by the IESG or the IAB. 2) We need to define a new standard character set to replace US ASCII. To answer John Klensin, it is of my own belief that using SMTP to shift character sets in and out is inappropriate. That is much more a message format level function. Now, I'm not opposed to using a character set that itself escape-shifts pages in and out, or a multi-byte character set that retains ascii compatibility. (whatever that means) a) My personal preference is to have a character set that is not limited to western characters (latin-1), however, if someone makes a good argument, I may be led to believe that multi-byte character sets can( and should?) be encoded in the standard 7 or 8 bit character set. b) With the 8 bit systems, a standard mechanism should be defined in SMTP transport to convert into a 7 bit representation without data loss. This is important. Information loss will prevent bits systems from ever being used for efficient transport, or multi-media, and encoded data because 7 bit systems will continue to exist. I have not heard much discussion on how to do this. From an "old" UA point of view, most of the conversions I've heard of that cause no info loss will be totally unintelligible. At lease I can guess at missing letters in text with info loss. Current ideas are Rynning's TEX-HEX and ISO 2022 (??). I would welcome a summary of available encoding technology and ideas. Implicit in these ideas is a determination on the primary type of use this system will have, and whether is it optimized for human-readability, or for efficiency of data transport. Specific ideas on these trade-offs and encodings are solicited 3) I would like to put up the one paragraph strawman. If this is acceptable, I will put it in the tentatively decided pile. "Current changes to SMTP will include the elimination of the 7 bit restriction in text, and the specification of a character set to represents the 8 bit information. The specification of a Binary mode for SMTP is the subject for possible future work. Binary data, and character sets unsupported by the standard character set will be handled by encoding defined in the message format documents" Again I'm looking for specific ideas, 1) Is a decision to change the SMTP specification to use 8 bit w/ no line length changes acceptable? 2) A summary(s) of available character sets, as well as an evaluation of how (if at all) they handle non-western characters. 3) An encoding mechanism to convert from 7bit to 8 bit systems with no data loss. Thanks, Greg Vaudreuil Internet Mail Extensions Chair. From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 16:33:15 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA01691; Thu, 31 Jan 91 16:09:18 EST Received: from Princeton.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA01684; Thu, 31 Jan 91 16:09:16 EST Received: from idunno.Princeton.EDU by Princeton.EDU (5.61++/2.62/princeton) id AA06941; Thu, 31 Jan 91 16:09:12 -0500 Received: from localhost by idunno.Princeton.EDU (5.61+++/1.107) id AA29515; Thu, 31 Jan 91 16:09:08 -0500 Message-Id: <9101312109.AA29515@idunno.Princeton.EDU> To: ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that In-Reply-To: Your message of Thu, 31 Jan 91 20:34:08 +0100. <9101311934.AA24929@ulysse.enst.fr> Date: Thu, 31 Jan 91 16:09:07 -0500 From: jwagner@princeton.edu > Also to the point, many BITNET mailers (at least the Crosswell) > do not send a "null" line, but rather a line with 8 spaces, because > RSCS doesn't like empty records. Some mailers do translate that, > and others don't. And some mailer readers correctly interpret > such a line followed by an indented line (as in a paragraph start) > as a second continuation line, ie. an extension of the header. Just to correct a slight inaccuracy. The technology on the IBM 360/370/390 series of computers is not string oriented (as in unix) but record oriented. Because of this a 0 length record has no meaning. It is possible to create files where records don't exist but not possible to say "there's a record here and it has a length of 0". The VM Network Mailer Release 2 (aka the Crosswell Mailer when it was release 1) uses virtual punched cards. All lines are 80 columns long (no matter how long they really should be). Unfortunately, this makes the job of the receiving UA a little tougher since it may have to recognize that lines of 80 blanks should be converted to CRLF (based on the origin of the mail being off the internet) and that this has to be done before it attempts to find the end of the headers. This translation should be occurring in the internet gateways but ... John Wagner From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 16:53:35 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA29967; Thu, 31 Jan 91 15:16:16 EST Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA29963; Thu, 31 Jan 91 15:16:13 EST Message-Id: <9101312016.AA29963@dimacs.rutgers.edu> Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 1286; Thu, 31 Jan 91 15:18:08 EST Received: from BLIULG11 by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 1285; Thu, 31 Jan 91 15:18:07 EST Received: from vm1.ulg.ac.be by BLIULG11 (Mailer R2.07) with BSMTP id 0247; Thu, 31 Jan 91 21:11:06 +0100 Date: Thu, 31 Jan 91 18:35:13 +0100 From: Andr'e PIRARD Subject: Re: Re: Character code sets and all that To: John C Klensin , "Alain FONTAINE (Postmaster - NAD)" Cc: Christopher.J.Tanner@nve.crl.aecl.ca, ietf-smtp@dimacs.rutgers.edu In-Reply-To: Message of Thu 31 Jan 91 07:22:30-EST from On Thu 31 Jan 91 07:22:30-EST John C Klensin said: >As I mentioned in the earlier note, we programming language standards >folks really like ISO8859-n: All the characters are the same size, there >are no multiple character overstrike sequences, etc. By contrast, >... >Now, in my capacity as a member of the "we will succeed better if we >keep this as simple as possible" community, I favor ISO8859-1 over >ISO8859-n, and ISO8859-n over individual designation of arbitrary G0 and >G1 sets. Other than the implications of one-character errors, using >... On Fri, 18 Jan 1991 09:34:15 -0500 (EST) Nathaniel Borenstein said: >My recommendation: minor tweaks to SMTP and the relevant RFC's where >absolutely necessary to support multimedia mail or other badly needed >functionality. Otherwise, if it ain't broke, don't fix it, and if it is >broke, hold our collective noses and replace it with the fast-maturing >international standard that was designed to replace it. Again, this is indeed my point of view as a witness of people who have a real need to use their own character set. In short, I find TCP/IP great, but I regret that this lack favors turnkey solutions offered by major constructors who managed to be international. Not favoring open systems. And extending to 8-bit is almost trivial. In fact, SMTP almost works satisfactorily, but it's forbidden. Mail is so important that I find it a good start for discussion. But the disease is more general than just mail, and sooner or later we will have to broaden the scope to other protocols. So, the right decisions have to be made. I have written a document to explain these sorts of problems with practical solutions or tips. Unhappily, it is not exactly aimed at TCP/IP and would need some rework. I am not sure I'll have time for this. So, at the risk of being less convincing and to write too quick English, I'll try to summarize. Depending on time, I'll store my paper as is with excuses, or rework it. The problem is that many computers have had great success because they offer 8-bit character sets, but that these codes are incompatible. Most of the characters are common, but at different code points. Having each know each other's code to exchange text (and know which is to translate to the other's) has been recognized and called the N*N problem. By evidence, a common 8-bit code is needed for data exchange. Arguing which or criticizing them ad infinitum is just caused by the problem that a single 8-bit code hasn't got enough characters with 256-. And indeed these discussions are typical of the mailing lists discussing double-byte or more codes. There will be a big step some day. I think it would be a waste of effort to try to obtain the same results with 8 or even 7 bits. But extending from 7 to 8 bits will have an immediate practical impact at minimum cost, if we agree that each 8-bit code serves a subset of the world. Choosing a code is a matter of internal consensus in an environment of protocols, as the need for translation at both ends makes it a theorically line-only code. But, of course, the more standard and used the better. And here comes ISO 8859 in line. Because the different versions have been carefully thought out to fill the needs of the languages they cover. It's my opinion that we must admit that one the ISO versions is used between people in a common language group and that they are not normally mixable. At least as a first solution. There are important things to decide before using such a code: - each translation to a proprietary code must be clearly defined by someone that should be the manufatcturer, but it appears they fail to do so. - it must be defined for the 256 code points, invertible and common to all the protocols. This is to assure the round trip integrity of the text data. One example. 1) The RFC says that a mail relay must use an invertible translation when it stores in another code mail to be relayed. And btw, by extension, this applies to the user who forwards or replies. 2) BITNET is a huge relay system. ASCII can come in then go out with EBCDIC on the way. 3) As a consequence, all mail gateways must use the same invertible translation on 7-bit today, 8-bit I hope tomorrow. 4) A problem is that translation from ASCII to EBCDIC is loosely defined. It seems that now more and more gateways aligned on a widely recognized one, so widely that it's the one IBM themselves ship with their TCP/IP which is the software acting as mail gateway in many sites. I hear of less and less complaints about corrupted encoded data. 5) IBM has recognized ISO 8859/1. They use codes containing the characters repertoire of version 1, but with different code points, on their mainframes and their PCs. On the PC, it's CP 850. On the mainframes, these codes are called "Country extended code pageS". Yes indeed, things get complicated because these codes containing all the ISO-1 characters are different for compatibility with previous extensions of US EBCDIC, they say (10 of them, with just a few unnoticed differences affecting ASCII translation!). 6) The best yet is that not only IBM does not define translation of these CECPs with ISO-1 (it's evident for graphics, but just as important for control characters), but also NONE OF THE CECPs IS COMPATIBLE WITH the aforementioned widely known translation they use themselves. 7) In consequence, we have to urge IBM to define and document a new compatible code page or invent our own. The same indetermination occurs with 5 other PC codes and the Mac (1). For these, the character set is only "close to" ISO-1. An arbitrary choice has to be made for what is out of this subset. An unexpected conclusion is thus that it would be vital and extremely useful that the RFCs contain a repertoire of translation tables! Be they the manufacturers' or not, but to be used by anyone or any protocol. You now understand my point of view better. Translation in various protocols is just enough a problem. One must constantly keep in mind that 8-bit text involves translation in many places such as the Macintosh FTP server sending filenames to a Sun requester operated by a PC Telnet client. Yes, John, it turned my PC emulator in permanent SO state. There are other points in my paper. Tell me if you are interested. And as a last remark, YES image data transfer mail would be very useful, as you have understood encoding is threatened by incorrect translation. P. S. Your RFCs invented the nice word "image". Stop talking of birary. Every computer data is binary. But most of all, an 8-bit code is NOT binary despite the name of the mixed-bad telnet option. Pardon the mistakes. My kid is waiting for me again. Andr'e PIRARD SEGI, Univ. de Li`ege B26 - Sart Tilman B-4000 Li`ege 1 (Belgium) pirard@vm1.ulg.ac.be or PIRARD%BLIULG11.BITNET@CUNYVM.CUNY.EDU From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 17:31:42 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA03570; Thu, 31 Jan 91 17:12:06 EST Received: from cunyvm.cuny.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA03565; Thu, 31 Jan 91 17:12:01 EST Received: from YKTVMV by CUNYVM.CUNY.EDU (IBM VM SMTP R1.2.2MX) with BSMTP id 4171; Thu, 31 Jan 91 17:12:39 EST Date: 31 Jan 1991 16:59:12 EST From: dan@ibm.com (Walt Daniels) Phone: 914-784/863-6736 To: ietf-smtp@dimacs.rutgers.edu Message-Id: <013191.165912.dan@ibm.com> Subject: Re: Character code sets and all that >The technology on the IBM 360/370/390 series of computers is not string >oriented (as in unix) but record oriented. Because of this a 0 length record >has no meaning. It is possible to create files where records don't exist but >not possible to say "there's a record here and it has a length of 0". > >The VM Network Mailer Release 2 (aka the Crosswell Mailer when it was release >1) uses virtual punched cards. All lines are 80 columns long (no matter how >long they really should be). Unfortunately, this makes the job of the >receiving UA a little tougher since it may have to recognize that lines of 80 >blanks should be converted to CRLF (based on the origin of the mail being off >the internet) and that this has to be done before it attempts to find the end >of the headers. This translation should be occurring in the internet gateways >but ... > > John Wagner Just to correct a slight inaccuracy, the 370 series are VERY good string processors. The popular operating systems on them are record rather than stream oriented. I worked for a number of years on an experimental operating system that had 0 length records. In fact a two line patch to the file system on CMS (just remove the syntax check) allows them and surprisingly little breaks. I don't know much about MVS but I suspect they allow them also. IBM internal mail has not had the 80 byte sillyness for years. Mail has always been 8 bit clean as well. This is particularly easy in a record rather than a stream with hex 0 or CR or CRLF delimiters. From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 18:01:45 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA03881; Thu, 31 Jan 91 17:19:34 EST Received: from corton.inria.fr by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA03865; Thu, 31 Jan 91 17:19:29 EST Received: from [192.44.64.233] by corton.inria.fr (5.65+/90.0.9) via Fnet-EUnet id AA01125; Thu, 31 Jan 91 23:19:19 +0100 (MET) Received: from ulysse.enst.fr (inf) by enst.enst.fr (4.1/SMI-4.0) id AA21887; Thu, 31 Jan 91 23:18:51 +0100 X-Network: Fnet-Eunet X-Organization: Telecom Paris (Ecole Nationale Superieure des Telecoms) X-Address: 46, rue Barrault - 75634 Paris cedex 13 - FRANCE Received: from helios.enst.fr by ulysse.enst.fr (4.1/SMI-4.0-MHS-6.0) id AA26302; Thu, 31 Jan 91 23:18:47 +0100 Date: Thu, 31 Jan 91 23:18:46 +0100 From: philipp@inf.enst.fr (Philippe-Andre Prindeville) Message-Id: <9101312218.AA26302@ulysse.enst.fr> To: gvaudre@nri.reston.va.us Subject: Re: The Next Step Cc: I think we should subdivide into MTA and UA groups. Transmission and translation is an MTA problem. Encoding is a UA problem. The MTA should *never* make any sort of lossy translation. Rather than accept then bit strip a message, it should reject it... I much more interested in working with the existing 7 bits systems, since my employers are wanting *quick* answers (ie. not ISO). For me the most pressing is encoding a universal character set (like ISO 10646) in groups of 7 bits. Latin 1 is too restrictive. I have not heard much discussion on how to do this. From an "old" UA point of view, most of the conversions I've heard of that cause no info loss will be totally unintelligible. At lease I can guess at missing letters in text with info loss. Here I strongly disagree. I've used BITNET mailers to receive French text. Many of them will replace characters they can't represent with spaces (ever sent a C program over BITNET? it arrives with no braces). For languages some languages, such as French, this is unacceptable. The difference between an accented and unaccented character can be significant in some languages, such as Slovak, where the accusative if an accented version of the nominative (if I remember my grammmar, which I probably don't). Encoding a large (> 7 bit) alphabet in binary is an acceptable, but suboptimal solution. Using compact mnemonic names is significantly better, as it allows a user familiar with the system to enter or view text in the absence of proper tools (like when I'm reading mail in the US at a conference). 1) Is a decision to change the SMTP specification to use 8 bit w/ no line length changes acceptable? Yes, as long as the transmission of non-ASCII messages does not hinge on this. 3) An encoding mechanism to convert from 7bit to 8 bit systems with no data loss. Is this text or data? It is important to differentiate. Data will only be manipulated by programs, but text may sometimes have to be viewed on-the-wire or as-is. -Philip From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 18:31:42 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA05326; Thu, 31 Jan 91 17:54:51 EST Received: from mp.cs.niu.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA05322; Thu, 31 Jan 91 17:54:47 EST Received: from localhost by mp.cs.niu.edu with SMTP id AA09312 (5.65a/IDA-1.4.2.7 for ietf-smtp@dimacs.rutgers.edu); Thu, 31 Jan 91 16:54:31 -0600 Message-Id: <9101312254.AA09312@mp.cs.niu.edu> To: Valdis Kletnieks Cc: ietf-smtp@dimacs.rutgers.edu Organization: Northern Illinois University, CS department. Subject: Re: Character code sets and all that In-Reply-To: Your message of "Thu, 31 Jan 91 17:27:32 EST." <910131.172732.EST.VALDIS@vtvm1.cc.vt.edu> Date: Thu, 31 Jan 91 16:54:29 -0600 From: "Neil W. Rickert" >On Thu, 31 Jan 1991 20:23:40 GMT Neil Rickert said: >>as to what is a line, and still be recognized. The ambiguities are not in >>RFC1154. It is merely reflecting the ambiguities in the understanding of >>the meaning of line over a broad range of systems. >Neil: > >In my personal opinion, RFC1154 is completely brain-dead on this issue. >(... much deletion) It is a fact of life, no matter how much you dislike it, that there are lots of IBM systems out there generating lots of mail, with a definition of line which prohibits completely empty lines. They insist on at least one blank, and sometimes of card image format with 80 blanks. You may not like it, and I may not like it, but no mail transfer standard is ever going to be widely accepted if it cannot handle BITNET mail from IBM hosts. -Neil From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 18:36:53 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA04672; Thu, 31 Jan 91 17:42:22 EST Received: from vtvm1.cc.vt.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA04665; Thu, 31 Jan 91 17:42:17 EST Received: from vtvm1.cc.vt.edu by VTVM1.CC.VT.EDU (IBM VM SMTP R1.2.1MX) with BSMTP id 4904; Thu, 31 Jan 91 17:42:04 EST Received: from vtvm1.cc.vt.edu (VALDIS) by vtvm1.cc.vt.edu (Mailer R2.07) with BSMTP id 0048; Thu, 31 Jan 91 17:42:04 EST Date: Thu, 31 Jan 91 17:27:32 EST From: Valdis Kletnieks Organization: Virginia Polytechnic Institute Subject: Re: Character code sets and all that To: Neil Rickert , ietf-smtp@dimacs.rutgers.edu Message-Id: <910131.172732.EST.VALDIS@vtvm1.cc.vt.edu> In-Reply-To: <1991Jan31.202340.27737@mp.cs.niu.edu> On Thu, 31 Jan 1991 20:23:40 GMT Neil Rickert said: > RFC1154 is that way specifically to allow for a large number of variations >as to what is a line, and still be recognized. The ambiguities are not in >RFC1154. It is merely reflecting the ambiguities in the understanding of >the meaning of line over a broad range of systems. Neil: In my personal opinion, RFC1154 is completely brain-dead on this issue. Given that the INTENT of RFC1154 is to allow encapsulation of different body parts, and to deliver them to various systems, if RFC1154 gives such a sloppy (yes, sloppy) definition of "apparently blank space" that 3 professional programmers can look at it and count 3 different values, then its "line" based philosophy is doomed to failure. Now, what if *my* interpretation of 'blank space' is basically the regular expression [ \t\n] (spaces, tabs, newlines)? Then there is only one "apparently blank line" seperating this paragraph from the previous. Of course, my *terminal* is convinced that there are 2 lines there.... Now, to liveware this doesn't matter much, but if this paragraph were flagged by an RFC1154 tag, you'd have just fed one line too many into UUDECODE, and thence to UNPACK, and thence to the bit bucket..... (mknod /dev/sarcasm c 0 0) On the other hand, this is a useful concept. Given that 'apparently blank line' is already enshrined in RFC1154, let's take it to its conclusion, and define 'apparently lowercase a', 'apparently o-with-slash', 'apparently some-character-that-looks-like-a-coke-bottle', etc. This way, a large number of variations on 'lowercase a', 'o-with-slash', and 'cokebottle' can be recognised...... The ambiguity wouldn't be in the RFC, it would be in the understanding of 'cokebottle' across a wide range of systems... (rm -f /dev/sarcasm) Valdis Kletnieks Computer Systems Engineer Virginia Polytechnic Institute Usual disclaimer: I said it. My boss didn't, neither did *his* boss. It's all my opinions, totally seperate from what I'm being PAID to do... From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 18:59:34 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA04348; Thu, 31 Jan 91 17:30:46 EST Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA04343; Thu, 31 Jan 91 17:30:20 EST Received: from Relay.Prime.COM by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24420; Thu, 31 Jan 91 17:29:24 EST Message-Id: <9101312229.AA24420@rutgers.edu> Received: (from user ARIEL) by Relay.Prime.COM; 31 Jan 91 16:50:32 EST To: IETF SMTP list From: Robert Ullmann Subject: re RFC1154 and such Date: 31 Jan 91 16:50:32 EST Hi, From: Nathaniel Borenstein > I hate to flog a dead horse, but if we're going to analyze Mark's > message, I have to mention that his "27" lines of text were 28 by the > time they reached me -- and there were only four hops. One of the Are you conting the blank line in between the sections? Encoding: 27 TEXT, 513 UUENCODE means the structure is: header (apparently) blank line 27 lines of text (apparently) blank line 513 lines of uuencode --- Observation: no matter *what* we do, some small number of gates or MTAs are necessarily going to appear broken. There are a lot of implementations out there, with a lot of bugs. (both old and new). If we mandate that no change be made that uncovers any bug anywhere, we can do nothing at all. --- I am writing this on a "real" 8859-1 terminal: i.e. it talks 8-bit to the machine, not 7-bit with ISO2202 (aka ECMA Std 35) escapes to switch from G0 to G1. The MTA runs in 8-bit, on TCP/IP and SMTP/X.25 (see RFC1090, which specifies ASCII-8 for use on X.25, written 2 years ago). Nothing I am "proposing" is speculation or theory. It all works; is in production, shipped product of Prime Computer. It is used both by Prime and our customers every day of the week. We pass 8-bit characters (0x20-0x7F, 0xA0-0xFF, plus most controls). We accept either LF or CRLF as a line terminator, and always send CRLF as a line terminator. We expect to find line terminators at least once every 1000+n chars for a small guard value of n. We refuse ISOC, CHAR, 8BIT, and anything else that isn't an SMTP command. We do open-standards* multi-media mail. We recognize a line containing only LWSP as blank when testing for a blank line. (*My definition of an open standard is one where you can add an option without asking for permission. You might have to ask for a number or keyword reservation, but in a real open standard that allocation is automatic, not tested by whether you are doing something in the Approved Manner. You also should be required to publish it, so others can use it. Internet is mostly open. ISO "OSI" is as *closed* as you can get.) And we interoperate. In practice. Not theory. With everything. Now. --- In practice, users are not surprised by gateways munging things. So that's nothing new. Mail relays are, of course, expected not to munge things; but there aren't that many: consider the ratio of hosts that are MX's for others to the total number of hosts. In Prime, there are ~4500 domain objects, about 3500 of which are mail destinations. (the others are X terminals and the like). Of those, about 15 are mail relays, handling mail they didn't originate and must forward. These are the only systems that must actually pass 8-bit un-munged. I guess what I am saying is that while in _theory_, lots of things can go wrong, in _practice_ the problems are fairly rare. (of course, Murphy tells us that any failure mode possible will happen somewhere ... :-) Several people complain that RFC1154 line counts will cause faults when lines get added/wrapped/lost. Do any of you have any EXPERIENCE with a line count failure? Huh? :-) (I don't mean "Have you ever seen lines wrapped by a mailer." I mean "Have you ever seen it _cause_ an actual failure.") Usenet news does it all day, every day. Prime handles about 150,000 messages/day (not counting news, with ~10K of them external. I have yet to see ONE case (in 3 years) where the lines were misplaced so as to mess up a count, that did not (also) mangle the message to the point of un-usability anyway. (I know a certain X.25 net in Italy that used to drop bytes, but get the CRC correct anyway! Ouch. :-) The only real common "failure" is that 8859-1 text gets stripped to 7-bit ASCII on some paths. Almost always by sendmail, which runs a protocol that only vaguely resembles SMTP anyway ... :-) Best Regards, Rob Ullmann +1 508 620 2800 x1376 From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 19:01:45 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA06570; Thu, 31 Jan 91 18:29:10 EST Received: from TWG.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA06566; Thu, 31 Jan 91 18:28:43 EST Received: from Obelix.twg.com by twg.com with SMTP ; Thu, 31 Jan 91 15:18:51 PST Date: Thu, 31 Jan 91 15:18:34 PST From: "Edward C. Bennett" To: ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that Message-Id: <9101311518.aa22415@Obelix.TWG.COM> In his letter dated Thu, 31 Jan 1991 20:23:40 GMT, Neil Rickert wrote: > >> In the Introduction it says body parts are separated by "an >> apparently blank line." What does "apparently blank" mean? >> Whitespace? Why not just say "a null line" like RFC822 does? > No! No! No! > You can't do that. > OK, I was obviously bein Unix-centric, apologies. However, my second point about resolving line count mis-matches still leaves me scratching my head. I can try to make my exploder as robust as possible, but that's still not bullet-proof. Ed -- Edward C. Bennett - The other WIN/MHS & MMDF guy edward@twg.com The Wollongong Group (415) 962-7252 1129 San Antonio Road, Palo Alto, CA 94303 "He's become a growling, snarling mass of white-hot canine terror" From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 19:31:41 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA06743; Thu, 31 Jan 91 18:39:49 EST Received: from vtvm1.cc.vt.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA06739; Thu, 31 Jan 91 18:39:47 EST Received: from vtvm1.cc.vt.edu by VTVM1.CC.VT.EDU (IBM VM SMTP R1.2.1MX) with BSMTP id 4997; Thu, 31 Jan 91 18:39:35 EST Received: from vtvm1.cc.vt.edu (VALDIS) by vtvm1.cc.vt.edu (Mailer R2.07) with BSMTP id 0568; Thu, 31 Jan 91 18:39:34 EST Date: Thu, 31 Jan 91 17:56:11 EST From: Valdis Kletnieks Organization: Virginia Polytechnic Institute Subject: Re: Character code sets and all that To: "Neil W. Rickert" Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: <910131.175611.EST.VALDIS@vtvm1.cc.vt.edu> In-Reply-To: <9101312254.AA09312@mp.cs.niu.edu> On Thu, 31 Jan 91 16:54:29 -0600 you said: > It is a fact of life, no matter how much you dislike it, that there are >lots of IBM systems out there generating lots of mail, with a definition >of line which prohibits completely empty lines. They insist on at least >one blank, and sometimes of card image format with 80 blanks. You may >not like it, and I may not like it, but no mail transfer standard is ever >going to be widely accepted if it cannot handle BITNET mail from IBM hosts. Neil: I happen to be paid to care for one of these IBM beasts that has trouble with null lines. I'm *quite* aware of the problem. My *point* was that the RFC should have been written in such a way that you had a *unambiguous* definition of 'apparently blank line'. In specific, my objection to the RFC as written is in section 3.2, where it defines . Unfortunately, they do NOT do a very good job of defining exactly what the terms 'text line' or 'blank line' mean. Back in RFC822, they were *quite* explicit in defining 'null line' in section 3.1, and 'LWSP-char' and 'linear-white-space' in section 3.3. Now, if RFC1154 gave a similarly clear definition of 'blank', there would not be a problem. Consider that my IBM 3090 thinks that an 80-byte record consisting of 40 pairs of the octets CR and LF is one line. Given the wording of RFC1154 as written, another site is quite proper to consider this as 40 blank lines. So much for the count being anywhere NEAR right. Imagine if RFC822 had define 'null line' as leniently. Wonder how long it would have taken before all mailers were able to recognize the "null line" seperating the headers from the body? Valdis Kletnieks Computer Systems Engineer Virginia Polytechnic Institute P.S. As usual, I'm just presenting *my* viewpoint. If you have an urge to associate my opinion with our official policy, please lay down till the urge passes.... From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 20:01:33 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA07626; Thu, 31 Jan 91 19:06:26 EST Received: from akbar.cac.washington.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA07622; Thu, 31 Jan 91 19:06:22 EST Received: from tomobiki-cho.cac.washington.edu by akbar.cac.washington.edu (5.65/UW-NDC Revision: 2.21 ) id AA06849; Thu, 31 Jan 91 16:05:30 -0800 Date: Thu, 31 Jan 1991 15:57:17 -0800 (PST) From: Mark Crispin Sender: Mark Crispin Subject: Re: Re: Character code sets and all that To: Jan Michael Rynning Cc: ietf-smtp@dimacs.rutgers.edu In-Reply-To: Message-Id: In , Jan Michael Rynning writes: >We had >terminals which always sent data with parity, even or odd, and a TOPS-20 >TELNET which passed all 8 bits through, without negotiating. I found >out the day we enabled 8-bit input on our UNIX machines. I solved the >problem by patching the TOPS-20 TELNET program, so that it followed the >protocol specification. I find this hard to believe, since I wrote TOPS-20 TELNET. It was written before the new 8-bit stuff, so it used the old scheme. That is, unless you negotiate binary mode, it is strictly 7-bit. It carefully turns off the high order bits on all keyboard input going to the network. For input from the network, it turns off high order bits and then applies parity for the benefit of terminals which garbage their output when parity is wrong. The only TELNET that didn't behave this way was at Stanford-SUMEX, where they hacked it locally to pass 8-bits without negotiating binary mode over my objections. I find it hard to believe that broken version made its way to Sweden! From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 20:05:47 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA07805; Thu, 31 Jan 91 19:17:22 EST Received: from akbar.cac.washington.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA07801; Thu, 31 Jan 91 19:17:20 EST Received: from tomobiki-cho.cac.washington.edu by akbar.cac.washington.edu (5.65/UW-NDC Revision: 2.21 ) id AA07228; Thu, 31 Jan 91 16:17:18 -0800 Date: Thu, 31 Jan 1991 16:06:56 -0800 (PST) From: Mark Crispin Sender: Mark Crispin Subject: Re: Character code sets and all that To: IETF Internet Mail Extensions WG In-Reply-To: <9101311042.aa21022@Obelix.TWG.COM> Message-Id: I agree with Edward Bennett's criticisms of RFC-1154; to wit, the ambiguity of some of its language and the lack of any means to synchronize when the line counts don't appear to match the message body. A further problem to an RFC-1154 processor happens on Unix and any other operating system which uses a single-byte newline instead of CRLF. There is no way for a Unix end process to tell whether a message had a CRLF at a particular point or a bare LF (us old TOPS-20 partisans can be smugly superior on this issue, albeit from under 6 feet of good English soil...). To be honest, I don't really know what a "line" means in an e-mail message. We can't use the Internet definition, simply because there is no unambiguous transformation between arbitrary Internet text and text on many operating systems including Unix. From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 21:31:43 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA10505; Thu, 31 Jan 91 21:10:37 EST Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA10499; Thu, 31 Jan 91 21:10:35 EST Received: from Relay.Prime.COM by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA06012; Thu, 31 Jan 91 21:10:15 EST Message-Id: <9102010210.AA06012@rutgers.edu> Received: (from user ARIEL) by Relay.Prime.COM; 31 Jan 91 21:12:20 EST Subject: Re: Character code sets and all that To: IETF SMTP list From: Robert Ullmann Comment: Created using PRIMAILPLUS Version 1.0 Alpha 5d Date: Thu, 31 Jan 91 21:12:16 EST Hi, > From: Valdis Kletnieks > In specific, my objection to the RFC as written is in section 3.2, where > it defines . Unfortunately, they do NOT do a very good job of > defining exactly what the terms 'text line' or 'blank line' mean. Back > in RFC822, they were *quite* explicit in defining 'null line' in section > 3.1, and 'LWSP-char' and 'linear-white-space' in section 3.3. > > Now, if RFC1154 gave a similarly clear definition of 'blank', there > would not be a problem. (just for the record, of the two authors of RFC1154, I am the one responsible for the wording and such. David Robinson did the actual implementation for our product.) I agree that it has proven to be ambiguous; I had thought it was quite clear. E.g. there is exactly one "apparently blank line" between this sentence and the one preceeding. By which I meant: it looks like one blank line when I look at it! This group is chartered to, in part, refine 1154; I take that to mean making it less ambiguous, explicitly defining the keywords for other objects of interest to the WG where appropriate, addressing other issues (e.g. extended char sets, which has become a hotter topic than I had thought it would be :-). To speak directly to Greg's intent to focus the discussion: I think the WG should have as its objective the issuance of two RFCs: 1) a 1-2 page Draft Standard, to be presented to the Los Alamos Plenary as a motion to advance to Internet Standard, updating RFC 821: removing the 7 bit restriction, declaring that hosts MUST pass the 8th bit, and SHOULD pass all "control" chars where possible. (00-1F, 7F, 80-9F) It MUST pass TAB, MUST recognize CRLF as end of line, and SHOULD recognize LF as end of line. In the absence of a header indication (e.g. Encoding:) the content SHOULD be interpreted as ISO 8859/1 (Latin-1). [or something like that; leaving line length alone. plus something about where 8859/1 is allowed in headers?] 2) a Proposed Standard on the Encoding header field and message structuring; updating or obsoleting RFC1154, possibly to be advanced to Draft at the summer meeting. This is where we can (IMHO) deal with all sorts of odd things that aren't 8859/1, but can be handled as text by the MTA's: LZJU90, JIS-KANJI, ISO-10646-wannabe*, etc. By doing all this neat stuff in the UA's we can get to it a *lot* sooner than if we try to get every MTA out there to comply. (*about 10646: as far as I can tell, this isn't even DIS yet, and can't therefore be gotten through the standard expensive channels ... can someone supply me with a copy?) Best Regards, Rob Ullmann +1 508 620 2800 x1736 From owner-ietf-smtp@dimacs.rutgers.edu Thu Jan 31 23:31:42 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA13760; Thu, 31 Jan 91 23:06:20 EST Received: from ucsd.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA13756; Thu, 31 Jan 91 23:06:15 EST Received: by ucsd.edu; id AA04784 sendmail 5.64/UCSD-2.1-sun Thu, 31 Jan 91 20:06:09 -0800 for ietf-smtp@dimacs.rutgers.edu Date: Thu, 31 Jan 91 20:06:09 -0800 From: brian@ucsd.edu (Brian Kantor) Message-Id: <9102010406.AA04784@ucsd.edu> To: MRC@cac.washington.edu, ietf-smtp@dimacs.rutgers.edu Subject: Re: Character code sets and all that I think it's clear that RFC1154 doesn't answer the problem of mixed media and character sets; trying to juggle angels on a pinhead to make it work is a waste of time. Let's pass it by and see what we can do to develope a more useful scheme. My belief is that multilingual, multialphabetic, multimedia mail is best solved by defining 1) a common EXCHANGE (not storage!) format 2) a common transport mechanism to move (1) from place to place 3) a common description of the encodings and intended presentation of the data 4) disparate user agents. This will probably mean redefining SMTP, as for example the multi-part stuff I mentioned earlier this month. If this is too ambitious for this list, as it may well be, we who are interested in making this next generation should move off elsewhere. I've no objection to seeing what can be done within the existing SMTP structure, but I'm not convinced it'll bend enough to do what needs to be done. - Brian From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 07:00:30 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21536; Fri, 1 Feb 91 06:48:09 EST Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21532; Fri, 1 Feb 91 06:48:07 EST Message-Id: <9102011148.AA21532@dimacs.rutgers.edu> Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 6527; Fri, 01 Feb 91 06:50:07 EST Received: from BLIULG11 by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 6526; Fri, 01 Feb 91 06:50:06 EST Received: from BLIULG11 by BLIULG11 (Mailer R2.07) with BSMTP id 0957; Fri, 01 Feb 91 12:47:22 +0100 Received: from vm1.ulg.ac.be by vm1.ulg.ac.be (IBM VM SMTP R1.2.2MX) with BSMTP id 0082; Fri, 01 Feb 91 12:47:19 +01 Date: Fri, 01 Feb 91 10:49:42 +0100 From: Andr'e PIRARD Subject: Re: Character code sets and all that To: Walt Daniels , ietf-smtp@dimacs.rutgers.edu In-Reply-To: Your message of 31 Jan 1991 16:59:12 EST [Note: subject is much shifting along] On 31 Jan 1991 16:59:12 EST you said: >I worked for a number of years on an experimental operating system >that had 0 length records. In fact a two line patch to the file >system on CMS (just remove the syntax check) allows them and >surprisingly little breaks. I don't know much about MVS but I suspect >they allow them also. Indeed, as opposed to the remark this text replies to, in a record oriented file system there's no reason 0 length records wouldn't be allowed. And in fact, MVS and NJE (network transport) are OK. So, the question is why doesn't IBM correct their CMS file system, if it's so easy? Instead of having each and every application remove a single blank from null lines they stored? If they care. IBM FTP does not. >IBM internal mail has not had the 80 byte sillyness for years. Mail >has always been 8 bit clean as well. This is particularly easy in a >record rather than a stream with hex 0 or CR or CRLF delimiters. It's mainly CMS again which chose to use virtual punched cards for mail. Correct me if I am wrong saying that virtual printers haven't got this limit and should be used instead. Anyway, NETDATA might also be used. I guess all kind of nodes support it by these days. So, if crlf is implied at the end of each record, BITNET may be made just OK to transport mail. Using it would need for mainframes to know better what's a tab, however. It's up to IBM again to extend the translation of ASCII to EBCDIC to 8-bit in a way compatible with the present 7-bit one, viz. defining a new CECP. Guys from the ISO8859 list suggested that BITNET should transport ASCII mail (they said ISCII). I wonder if they are not correct, even if hard to admit to some. In fact, this would be much in line with the remark on this list that translation is not a MTA but an UA concern. True indeed, translation should occur on transition between user's file and local spool system or equivalent. MTA is concerned with the header but just crlf in the body and should be thought of as "not touching it" otherwise than for requirements to store it. In this respect, the RFC doesn't exactly say that the 8th bit must be 0, but that it's ASCII with the 8th bit set to 0 (not far from a pleonasm to us, 8-bit codes users). The point is the intention. I think it's just allowance for a MTA to store 7-bit-wise, not asking to kill the 8th bit for pleasure. In fact, most work 8-bit wise don't they? Is the MTA matter a change to the RFC or a host requirement? So, the problem is mainly a MUA matter: "Encoding:" ? Andr'e PIRARD SEGI, Univ. de Li`ege B26 - Sart Tilman B-4000 Li`ege 1 (Belgium) pirard@vm1.ulg.ac.be or PIRARD%BLIULG11.BITNET@CUNYVM.CUNY.EDU From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 07:30:29 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21653; Fri, 1 Feb 91 06:53:13 EST Received: from lth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21649; Fri, 1 Feb 91 06:53:02 EST Received: from Valhall.dna.lth.se by lth.se (5.61-bind 1.5+ida/LTH-4-NS); Fri, 1 Feb 91 12:52:41 +0100 (MET) Received: by DNA.LTH.Se (5.65+bind 1.7+ida 1.4.2/DNA 4-Dan); Fri, 1 Feb 91 12:52:38 +0100 (MET) Date: Fri, 1 Feb 91 12:52:38 +0100 From: Dan Oscarsson Message-Id: <9102011152.AA15757@DNA.LTH.Se> To: gvaudre@nri.reston.va.us, ietf-smtp@dimacs.rutgers.edu Subject: Re: The Next Step After having read through the weeks flow of entries I will give some more points and suggestions. >Again I'm looking for specific ideas, > > 1) Is a decision to change the SMTP specification to use 8 bit > w/ no line length changes acceptable? > Yes. 8 bits does not mean binary transfer. If we are going to switch to binary transfer we must forget about "lines". Also real binary transfer will complicate sending mail into 7-bit systems. > 2) A summary(s) of available character sets, as well as an > evaluation of how (if at all) they handle non-western characters. > > 3) An encoding mechanism to convert from 7bit to 8 bit systems > with no data loss. > In the area of encoding text one can do it different ways in parts of a letter. There are encoding of 8bit text and there are encoding of binary data. Encoding of text need only encode those characters that contains the eighth bit set while binary normally demands all characters to be encoded. In the letter header a simple text encoding can be used that does not clash with the RFC 822 syntax. In the letter body the same sparse encoding that is used for headers can be used, but can use other escape characters that header encoding. Several people wants a simple solution that can quickly be implemented. I agree, but it should from the beginning allow all characters in the world and inclusion on non-text data without any change to the letter standard. Note: This is what is used for email transmission (MTA to MTA). The UA may use a different format to talk to its MTA. I suggest: 1. We define that we use 8 bits/character. Use text lines separated with CR LF and not having unlimited line lengths. 2. The text is sent using ISO 10646. As it is only ISO DIS 10646 just now, initially only the subset ISO 8859-1 of ISO 10646 is guarantied to work. (this will make it easy for many to get a first implementation, but leaving the door open for all other characters. Though my first sendmail implementation of the new standard will include support for 10646 and converstion to local character sets (used by UAs)). The changes to RFC 822 are not very many. 3. Allowing non-text data to be included anywhere in a text by enclosing it in "escape codes". That is: text ... non-text . Where and are 8bit control characters, and non-text is the non-text data encoded as text. Having start and end characters instead of lines like in RFC1154 should reduce the possibility of getting out of sync. This allows a general hook into which we later can insert audio, images or X.400 bodies. - Compatibility and "old" (7-bit) interchangeability. It appears that most think it is more important to retain all data in hope that after a few 7-bit MTAs the letter will enter an 8-bit one or reach a user with an enhanced UA, that to try to make the letter as readable as possible incase not. In either way SMTP have to be expanded with a query if the receiver can understand the new 8 bit standard like ISOC in my sendmail. --- You may also remember that our standard will affect netnews and NNTP. Dan -- Dan Oscarsson Department of Computer Science Lund Institute of Technology e-mail: Dan.Oscarsson@dna.lth.se Box 118 S-221 00 Lund, Sweden From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 10:00:29 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24014; Fri, 1 Feb 91 09:35:55 EST Received: from thumper.bellcore.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24010; Fri, 1 Feb 91 09:35:52 EST Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id for IETF-SMTP@dimacs.rutgers.edu; Fri, 1 Feb 91 09:35:43 EST Received: by greenbush.bellcore.com (4.12/4.7) id for IETF-SMTP@dimacs.rutgers.edu; Fri, 1 Feb 91 09:38:22 est Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Fri, 1 Feb 1991 09:38:18 -0500 (EST) Message-Id: Date: Fri, 1 Feb 1991 09:38:18 -0500 (EST) From: Nathaniel Borenstein To: IETF SMTP list Subject: Re: Character code sets and all that In-Reply-To: <9101312312.AA20682@thumper.bellcore.com> References: <9101312312.AA20682@thumper.bellcore.com> Yeah, as my later message made clear, I was confused about the (ill-defined) blank line. Excerpts from mail: 31-Jan-91 Re: Character code sets and.. Robert Ullmann@RELAY.PRI (3915) > Observation: no matter *what* we do, some small number of gates or MTAs > are necessarily going to appear broken. There are a lot of implementations out there, with a lot of bugs. (both old and new). > If we mandate that no change be made that uncovers any bug anywhere, we can do nothing at all. I'm sorry, but this is errant nonsense. In the Andrew project, we built a multimedia mail system that transmits arbitrary data, including bitmaps, sound, and much more, and carefully designed the datastream so that it wouldn't break on any mail gateway in the world. It has been used across nearly every gateway in the world, by tens of thousands of users at hundreds of very heterogeneous sites, so we know it works. The rules we came up with are simple and not very restrictive. So the claim that we can do nothing at all without fixing the gateways is just not true. By accepting a few not-terribly-restrictive limitations, you can avoid all the work of changing the gateways. I honestly can't imagine any reason to take an approach that is so much more work for absolutely no additional gain. > Nothing I am "proposing" is speculation or theory. It all works; is in > production, shipped product of Prime Computer. It is used both by Prime > and our customers every day of the week. > We pass 8-bit characters ... Yes, yes, I know. But it is MUCH EASIER to make things work internally to one well-defined world than to upgrade every gateway. Mind you, I endorse changing the gateways if it is necessary, it's just not necessary in this case. By the way, my previously described "metamail" approach is an "open standard" in every sense of your approach, and is easier to reconfigure.) > I guess what I am saying is that while in _theory_, lots of things can > go wrong, in _practice_ the problems are fairly rare. (of course, Murphy > tells us that any failure mode possible will happen somewhere ... :-) Well, I'd posit that your experience is less diffused than ours. The Andrew system was distributed world-wide for free via the X11 tape. We found out very early what breaks almost every gateway in the world. You don't get that kind of experience in a more closed world such as Prime. > Do any of you have any EXPERIENCE with a line count failure? Huh? :-) > (I don't mean "Have you ever seen lines wrapped by a mailer." I mean > "Have you ever seen it _cause_ an actual failure.") Usenet news does it > all day, every day. Prime handles about 150,000 messages/day (not counting news, with ~10K of them external. Absolutely. Berkeley sendmail did it to us before we imposed the 80 character line restriction. Pictures sent via multimedia mail from CMU to NSF, just over the Internet, were garbled until we added the specification about line lengths. It really does happen. (By the way, as far as volume goes, Andrew at CMU handles about 30K messages a day, and at IBM it handles much more, not to mention all the other Andrew sites in the world, so don't think you have a monopoly on experience in-the-large. The fact that we don't handle them all in one monolithic installation is an ADVANTAGE in terms of understanding the dynamics of mail in the larger world.) Sorry if I seem angry, but I'm just amazed that so much energy may be wasted on upgrading relays when it is so unncessary and there are so many more important things to do... -- Nathaniel From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 08:00:30 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA22129; Fri, 1 Feb 91 07:35:58 EST Received: from corton.inria.fr by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA22125; Fri, 1 Feb 91 07:35:54 EST Received: from [192.44.64.233] by corton.inria.fr (5.65+/90.0.9) via Fnet-EUnet id AA06214; Fri, 1 Feb 91 13:35:45 +0100 (MET) Received: from ulysse.enst.fr (inf) by enst.enst.fr (4.1/SMI-4.0) id AA23476; Fri, 1 Feb 91 13:35:22 +0100 X-Network: Fnet-Eunet X-Organization: Telecom Paris (Ecole Nationale Superieure des Telecoms) X-Address: 46, rue Barrault - 75634 Paris cedex 13 - FRANCE Received: from helios.enst.fr by ulysse.enst.fr (4.1/SMI-4.0-MHS-6.0) id AA06458; Fri, 1 Feb 91 13:35:20 +0100 Date: Fri, 1 Feb 91 13:35:20 +0100 From: philipp@inf.enst.fr (Philippe-Andre Prindeville) Message-Id: <9102011235.AA06458@ulysse.enst.fr> To: Subject: Re: Character code sets and all that I agree with most of what Brian says, but I still think it is important to be able to exchange (using the proposed idea of a universal format) mail over 7-bit SMTP. And to make that happen, it should be the user agents that present the mail to the MTAs already "cooked" (to use a unixism). If the Japanese can send 14 bit data (packed into 16 bits) then we should be able to do as least as well. One other stab at 1154 -- it is not good that a part is the atomic element of a message. I can well envision text that has (in a single line): Cyrillic, Latin 1, and possibly Japanese or Chinese. -Philip From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 10:30:32 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24326; Fri, 1 Feb 91 09:49:44 EST Received: from thumper.bellcore.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24319; Fri, 1 Feb 91 09:49:41 EST Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id for IETF-SMTP@dimacs.rutgers.edu; Fri, 1 Feb 91 09:49:31 EST Received: by greenbush.bellcore.com (4.12/4.7) id for IETF-SMTP@dimacs.rutgers.edu; Fri, 1 Feb 91 09:52:10 est Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Fri, 1 Feb 1991 09:52:06 -0500 (EST) Message-Id: Date: Fri, 1 Feb 1991 09:52:06 -0500 (EST) From: Nathaniel Borenstein To: IETF SMTP list Subject: Nuking RFC 1154 (Re: Character code sets and all that) In-Reply-To: <9102010210.AA06012@rutgers.edu> References: <9102010210.AA06012@rutgers.edu> The worst thing about RFC 1154 is that it is a completely unnecessary extension to RFC 1049. Here's why: RFC 1049 gives us a Content-type header that allows a mail message to be specified as a single non-text media type. This leaves us with the rather obvious problem that you want mixed media messages. Now, this is of course possible with RFC 1049 (as demonstrated by Andrew's use of it) but not in an "open systems" way -- the mixing format is in some sense proprietary, or at least non-standard. Imagine, however, that instead of extending RFC 1049 in the way that RFC 1154 did it, we instead just defined (and carefully specified) a new content type, which might be called "mixed media". This new type would be defined as a method for encapsualting multiple objects of differing content types in a single message. The encoding could be carefully specified to minimize impact on gateways, as was the Andrew datastream -- e.g., the encoding itself would always be 7 bit printable ASCII with short lines, and so on, but it would have a standard and reversable way of representing arbitrary 8 bit data. Advantages of this scheme over RFC 1154: we could have a well-defined mechanism for including arbitrary object types without modifying any gateways. Disadvantage: Anyone now using RFC 1154 would have to retrofit their software. My guess is that at the moment, however, there are still more users of 1049 than 1154. Frankly, I feel rather amazed that there is so much controversy here. When building Andrew, which was one of the first systems that sent multimedia mail widely over diverse networks and gateways, we KNEW we wouldn't be able to change the gateways, so we carefully designed the data stream so that we wouldn't need to. Now that the world is really ready for multimedia mail, changing the gateways is indeed an option for the mail-builders of today. But the fact that it CAN be done doesn't mean that it is DESIRABLE to do it if we can get the job done another way. Andrew is a very clear existence proof: you can send arbitrary data via any mail gateway by encoding it reversably in printable 7 bit ASCII. (Andrew now even sends ISO 8 bit characters, in fact.) A carefully defined "Mixed Media" RFC 1049 content type is all that is needed. Why should we make so much more work for everyone if we can avoid it? -- Nathaniel PS -- By the way, one more thing that should be avoided in multimedia mail formats, if you want to avoid gateway headaches, is meaningful white space at the end of a line. There are some gateways that will turn a line that ends with white space into a line without that white space. Stupid, yes, and probably counter to spec somewhere, but a prudent design for the "mixed media" format (or any other format, really) would avoid having trailing white space on a line ever being significant. If you think these concerns are trivial, consider what we must have gone through to find them out, and ask yourself if you want to endure (or cure) similar agonies... -- Nathaniel From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 10:31:32 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24007; Fri, 1 Feb 91 09:34:30 EST Received: from hydra.Helsinki.FI by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24003; Fri, 1 Feb 91 09:34:11 EST Received: from poros.Helsinki.FI by hydra.Helsinki.FI (4.1/SMI-4.1/32) id AA26176; Fri, 1 Feb 91 16:34:18 +0200 Date: Fri, 1 Feb 91 16:34:18 +0200 From: kankkune@cs.helsinki.fi (Risto Kankkunen) Message-Id: <9102011434.AA26176@hydra.Helsinki.FI> In-Reply-To: Greg Vaudreuil's message as of Jan 31, 15:41 X-Mailer: Mail User's Shell (7.2.0 10/31/90) To: IETF Internet Mail Extensions WG Subject: Re: The Next Step Greg Vaudreuil: "The Next Step" (Jan 31, 15:41): > > Folks, > > In this message I will attempt to distill the discussions on this > list, and focus them a bit. > > 1) I would like to follow up on Jan Michael Rynning's comment that > 8bit/no line length restrictions are not binary. It is becoming > clear that one possible goal or sending binary via SMTP is not as > simple as it at first appeared to many of us. He is right in pointing out that text is stored in different ways in different environments but binary data shouldn't undergo those conversions. However, I think this doesn't matter at the SMTP level, as SMTP only tells how a stream of bytes can be transferred to another host. SMTP should not be concerned with what these bytes mean. It currently isn't (it speaks about ASCII, but you could interpret that as 7-bit bytes without breaking anything. RFC 822 tells what those bytes mean.). The receiver can then look for an Encoding field to decide how to present or store the data. We should remember that smTp is only a _transfer_ protocol. Like Andr'e Pirard said here: > In fact, this would be much in line with the remark on this list that > translation is not a MTA but an UA concern. True indeed, translation > should occur on transition between user's file and local spool system or > equivalent. MTA is concerned with the header but just crlf in the > body and should be thought of as "not touching it" otherwise than for > requirements to store it. ... > So, the problem is mainly a MUA matter: "Encoding:" ? > In all circumstances, any change to the standards that is a > violation of current specifications, must be negotiated. This may > be by version number, or by explicit feature negotiation. This is > just plain good sense, and is likely to be enforced by the IESG or > the IAB. The extended SMTP could define a command EXTN or FEAT to query the capabilities of the receiving SMTP: S: EXTN or S: FEAT R: 250-8-BITS R: 250-8-BIT-TEXT R: 250 UNLIMITED-LINES R: 250 BINARY The syntax of the DATA command wouldn't have to be changed. What happens, if the mailer has an 8-bit data to send and the receiver doesn't know binary, is not the matter of SMTP. The program can encode it or translate it somehow and maybe change the Encoding headers, but SMTP doesn't have to specify this. I think 7-bit restriction and line-length limitations should be removed from the spec now, even if RFC 822 wouldn't benefit from it now, or if implementing these thing aren't easy. It's much easier to revise the programs or 822 etc. later when the ground protocol is stable and isn't too restrictive. > 2) We need to define a new standard character set to replace US ASCII. > To answer John Klensin, it is of my own belief that using SMTP to > shift character sets in and out is inappropriate. That is much > more a message format level function. Now, I'm not opposed to > using a character set that itself escape-shifts pages in and out, or a > multi-byte character set that retains ascii compatibility. (whatever > that means) Do we need any character set standard at SMTP level? The new RFC 822 could state that the default set is Latin-1, if the message doesn't have an encoding header. > Current ideas are Rynning's TEX-HEX and ISO 2022 (??). I would > welcome a summary of available encoding technology and ideas. > Implicit in these ideas is a determination on the primary type of use > this system will have, and whether is it optimized for > human-readability, or for efficiency of data transport. Specific > ideas on these trade-offs and encodings are solicited One encoding method I find interesting is ABE. It is designed to encode binary to printable characters that survive ASCII-EBCDIC translations. Its benefits over uuencode, xxencode and btoa is that it retains most of the printable characters the same in the encoded form. It is also quite compact and has crc-checking and other features to help detecting and correcting corrupted messages. This paragraph encoded with ABE looks like this: ;ABE ASCII-Binary-Encoding (by Brad Templeton) ;Use 'sort' and/or 'dabe' to decode T./z$$filecount=1 T.0O##S1000,1000,1000,ABE2 T.1N$$blocking=false T.2x$$uname=stdin T.3f$$os=unix T.4u"".sq3po3nm3lk3ih3/g/fe3dc3ba3TS3OI34230/3.Y2MU.R6/ T.5u""/.V.KJ.HF.9G.LN.PQ.53.1W.01/23/45/67/89/7X.rZ2jz. T.6b""0tA2BC.DE.FG3HI2JK3LM3NO2PQ3RS2TU/VW3XY3Z83AB3CD3 T.7X""1Ea2bc.de.fg.hi.jk2lm.no.pq.rs.tu.vw.xy.zu3vw3xy3 T.8m""2./8018238458678898AB8CD8EF8GH8IJ8KL8MN8OP8QR8ST8 T.9X""3UV8WX8YZ8ab8cd8ef8gh8ij8kl8mn8op8qr8st8uv8wx8yz8 T.Ao""4./D01D23D45D67D89DABDCDDEFDGHDIJDKLDMNDOPDQRDSTD T.BZ""5UVDWXDYZDabDcdDefDghDijDklDmnDopDqrDstDuvDwxDyzD T.CwOne.encoding.method.I.find.interesting.is.ABE1.It.is.designed.to. T.DLencode/binary.to.printable.characters.that.survive.ASCII3EBCDIC.t T.ELranslations1/Its.benefits.over.uuencode5.xxencode.and.btoa.is.tha T.FTt.it.retains.most.of/the.printable.characters.the.same.in.the.enc T.GAoded.form1.It.is.also.quite/compact.and.has.crc3checking.and.othe T.H9r.features.to.help.detecting.and/correcting.corrupted.messages1.T T.Iyhis.paragraph.encoded.with.ABE.looks/like.this7/ T.JT$$size=438 T.KR$$end_file=stdin T.LH$$filecrc32=2358437449 T.MD##E41438 ;End of ABE encoding Risto -- Risto Kankkunen kankkune@cs.Helsinki.FI (Internet) Department of Computer Science kankkunen@finuh (Bitnet) University of Helsinki, Finland ..!mcsun!uhecs!kankkune (UUCP) From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 11:06:06 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA25722; Fri, 1 Feb 91 10:31:09 EST Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA25706; Fri, 1 Feb 91 10:30:55 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA17210; Fri, 1 Feb 91 16:19:29 +0100 Date: Fri, 1 Feb 91 16:19:27 +0100 From: Jan Michael Rynning To: Robert Ullmann Cc: IETF SMTP list Subject: Re: Character code sets and all that In-Reply-To: Your message of Thu, 31 Jan 91 21:12:16 EST Message-Id: Robert Ullmann asks: > (*about 10646: as far as I can tell, this isn't even DIS yet, and > can't therefore be gotten through the standard expensive channels ... > can someone supply me with a copy?) 10646 has been submitted for circulation to national body vote as a DIS. Your copy of the document is in the mail. From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 11:29:49 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA26099; Fri, 1 Feb 91 10:48:49 EST Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA26095; Fri, 1 Feb 91 10:48:47 EST Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 7560; Fri, 01 Feb 91 10:50:56 EST Received: from VM1.calc.ucl.ac.be by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 7559; Fri, 01 Feb 91 10:50:54 EST Received: by BUCLLN11 (Mailer R2.07) id 3883; Fri, 01 Feb 91 13:22:15 +0100 Date: Fri, 01 Feb 91 13:02:51 +0100 From: "Alain FONTAINE (Postmaster - NAD)" Message-Id: <910201.130251.+0100.af@sei.ucl.ac.be> Subject: Re: on mailing binaries To: Robert Ullmann , IETF SMTP list In-Reply-To: Your message of 29 Jan 91 15:18:39 EST I have had time to read the draft specification for a method for transporting 'binary objects' in mail messages. I do think that a standard method must be devised and documented, since there are cases when there is no other practical solution, and current 'well known' methods have a tendency to fail miserably when used outside a very restricted environment. I do have a few objections to the proposed text : 1- the method is said to be able to encode 'binary objects', with a flavor of universality. The text in fact only deals with encoding a single, undivided octet stream. On many systems widely used, even the simplest 'binary objects' are not a single octet stream, but rather an ordered collection of such streams, with boundaries imposed by magic. A standard method for 'binary object' encoding should, IM(not so)HO, be able to encode record oriented files. So its basic object of attention should be an ordered collection of one or more arbitrary octet streams. 2- while the text says that only 64 characters are used in the encoding, this is not strictly true. One should add '*' (not a big deal), but also all the characters allowed in the source system to name a file. 3- my experience is that users want some information about the object to be part of the deal (how many times did I hear the question : 'how can I make a FTP transfer, preserving the creation date of the file'...). Of course, different systems keep different amounts of information about the 'binary objects' they manage, but I still believe it would be worthwile to identify some key items that should be transported. In order to avoid unexpected extensions of the character set used in transmission, those items should be 'inside' the encoding... Thank you all for your kind attention... Alain FONTAINE +--------------------------------+ Universite Catholique de Louvain | If your mail software barks at | Service d'Etudes Informatiques | my address, you may try : | Batiment Pythagore | | Place des Sciences, 4 | FNTA80@BUCLLN11.BITNET | B-1348 Louvain-la-Neuve, BELGIUM +--------------------------------+ phone +32 (10) 47-2625 From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 12:30:30 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28613; Fri, 1 Feb 91 12:04:33 EST Received: from dkuug.dk by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28595; Fri, 1 Feb 91 12:04:00 EST Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA14054; Fri, 1 Feb 91 18:02:05 +0100 Date: Fri, 1 Feb 91 18:02:05 +0100 From: Keld J|rn Simonsen Message-Id: <9102011702.AA14054@dkuug.dk> To: PIRARD@vm1.ulg.ac.be, jmr@nada.kth.se, philipp@inf.enst.fr Subject: Re: Character code sets and all that Cc: ietf-smtp@dimacs.rutgers.edu X-Charset: ASCII X-Char-Esc: 29 Considering and they are all included in T.61 and quite some other ISO 6937-2 like character sets. Keld Simonsen From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 13:00:29 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28817; Fri, 1 Feb 91 12:12:26 EST Received: from cyklop.nada.kth.se by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28810; Fri, 1 Feb 91 12:12:13 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA21936; Fri, 1 Feb 91 18:11:07 +0100 Date: Fri, 1 Feb 91 18:11:06 +0100 From: Jan Michael Rynning To: Mark Crispin Cc: Jan Michael Rynning , ietf-smtp@dimacs.rutgers.edu Subject: Re: Re: Character code sets and all that In-Reply-To: Your message of Thu, 31 Jan 1991 15:57:17 -0800 (PST) Message-Id: Mark Crispin writes: > In , Jan Michael Rynning writes: > >We had > >terminals which always sent data with parity, even or odd, and a TOPS-20 > >TELNET which passed all 8 bits through, without negotiating. I found > >out the day we enabled 8-bit input on our UNIX machines. I solved the > >problem by patching the TOPS-20 TELNET program, so that it followed the > >protocol specification. > > I find this hard to believe, since I wrote TOPS-20 TELNET. It was written > before the new 8-bit stuff, so it used the old scheme. That is, unless you > negotiate binary mode, it is strictly 7-bit. It carefully turns off the high > order bits on all keyboard input going to the network. For input from the > network, it turns off high order bits and then applies parity for the benefit > of terminals which garbage their output when parity is wrong. > > The only TELNET that didn't behave this way was at Stanford-SUMEX, where they > hacked it locally to pass 8-bits without negotiating binary mode over my > objections. I find it hard to believe that broken version made its way to > Sweden! We got some of our software from Stanford. TELNET may have been one of those programs. The funny thing, though, is that the TELNET.EXE which lets all 8 bits through without negotiation was created and last written by someone called "CRISPIN". (Vera.Stacken.KTH.SE [130.237.237.5] is still around, if you want to go more closely into the matter.) For the record, Mark's description of what his TOPS-20 TELNET does, is just what I think it should do. From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 13:30:31 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00148; Fri, 1 Feb 91 12:53:13 EST Received: from ucsd.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA00141; Fri, 1 Feb 91 12:53:08 EST Received: from tots.UUCP by ucsd.edu; id AA28785 sendmail 5.64/UCSD-2.1-sun via UUCP Fri, 1 Feb 91 09:41:55 -0800 From: tep@tots.logicon.com Received: from galt by tots.Logicon.COM (3.2/4.07) id AA03515; Fri, 1 Feb 91 09:22:30 PST Received: by galt.tots.Logicon.COM (3.2/4.02-client) id AA06276; Fri, 1 Feb 91 09:19:37 PST Date: Fri, 1 Feb 91 09:19:37 PST Message-Id: <9102011719.AA06276@galt.tots.Logicon.COM> To: ietf-smtp@tots.logicon.com In-Reply-To: Brian Kantor's message of Thu, 31 Jan 91 20:06:09 -0800 <9102010406.AA04784@ucsd.edu> Subject: Character code sets and all that Reply-To: tep@tots.logicon.com X-Organization: Logicon, Inc., San Diego, California Date: Thu, 31 Jan 91 20:06:09 -0800 From: brian@ucsd.EDU (Brian Kantor) I think it's clear that RFC1154 doesn't answer the problem of mixed media and character sets; trying to juggle angels on a pinhead to make it work is a waste of time. Let's pass it by and see what we can do to develope a more useful scheme. However, an RFC1154-like scheme on OCTETS instead of lines might be more useful, especially if it contained message part markers to allow robust re-synchronization. The Andrew folks are always remarking that "our system works", but everytime I ask about documentation, I get "we are writing RFCs that will be available Real Soon Now". Are these now available and I missed them? My belief is that multilingual, multialphabetic, multimedia mail is best solved by defining 1) a common EXCHANGE (not storage!) format 2) a common transport mechanism to move (1) from place to place 3) a common description of the encodings and intended presentation of the data 4) disparate user agents. I couldn't agree more. This will probably mean redefining SMTP, as for example the multi-part stuff I mentioned earlier this month. Can we not have MTAs send around complete messages, and leave the decomposition of messages to UTAs? I really like the multi-part SMTP example you showed, as it allows a large amount of flexibility, but if you look at this as a layering problem, I don't know how much the MTA should know about internal message structure. Sendmail (the Evil Header Munger) clearly knows too !#$%& much. Tom Perrine (tep) |Internet: tep@tots.Logicon.COM Logicon |UUCP: sun!suntan!tots!tep Tactical and Training Systems Division | San Diego CA |GENIE: T.PERRINE "Harried: with preschoolers" |+1 619 455 1330 From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 15:03:45 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA03623; Fri, 1 Feb 91 14:34:06 EST Received: from thumper.bellcore.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA03619; Fri, 1 Feb 91 14:34:02 EST Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id for ietf-smtp@dimacs.rutgers.edu; Fri, 1 Feb 91 14:33:58 EST Received: by greenbush.bellcore.com (4.12/4.7) id for ietf-smtp@dimacs.rutgers.edu; Fri, 1 Feb 91 14:36:36 est Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Fri, 1 Feb 1991 14:36:29 -0500 (EST) Resent-Message-Id: <0beQAxW0M2YtA7VPsA@thumper.bellcore.com> Resent-Date: Fri, 1 Feb 1991 14:36:29 -0500 (EST) Resent-From: Nathaniel Borenstein Resent-To: ietf-smtp@dimacs.rutgers.edu Message-Id: Date: Fri, 1 Feb 1991 14:36:00 -0500 (EST) From: Nathaniel Borenstein To: ietf-smtp@tots.logicon.com Subject: Re: Character code sets and all that In-Reply-To: <9102011719.AA06276@galt.tots.Logicon.COM> References: <9102011719.AA06276@galt.tots.Logicon.COM> Excerpts from internet.ietf-smtp: 1-Feb-91 Character code sets and all.. tep@tots.logicon.com (1872) > The Andrew folks are always remarking that "our system works", but > everytime I ask about documentation, I get "we are writing RFCs that > will be available Real Soon Now". Are these now available and I missed > them? Um, I'm not sure what you're talking about, but RFC 1049 (Content-type) was the only RFC we ever released. We never (NEVER) proposed the Andrew datastream as a standard (I could give you a zillion reasons NOT to use it), but I do argue that certain aspects of it are worth keeping -- notably the way it accomodates stupid mail gateways rather than insisting that they change. As for "documentation", the Andrew software comes with oceans of it, and there are dozens of technical papers published in the literature. What exactly do you want to know? From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 16:00:46 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA05970; Fri, 1 Feb 91 15:46:39 EST Received: from ucsd.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA05966; Fri, 1 Feb 91 15:46:35 EST Received: from tots.UUCP by ucsd.edu; id AA18509 sendmail 5.64/UCSD-2.1-sun via UUCP Fri, 1 Feb 91 12:41:52 -0800 From: nsb@thumper.bellcore.com Received: by tots.Logicon.COM (3.2/4.07) id AA03794; Fri, 1 Feb 91 11:45:35 PST Received: from thumper.bellcore.com by ucsd.edu; id AA11131 sendmail 5.64/UCSD-2.1-sun via SMTP Fri, 1 Feb 91 11:33:32 -0800 for ietf-smtp Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id for ietf-smtp@tots.logicon.com; Fri, 1 Feb 91 14:33:24 EST Received: by greenbush.bellcore.com (4.12/4.7) id for ietf-smtp@tots.logicon.com; Fri, 1 Feb 91 14:36:03 est Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Fri, 1 Feb 1991 14:36:00 -0500 (EST) Message-Id: Date: Fri, 1 Feb 1991 14:36:00 -0500 (EST) To: ietf-smtp@tots.logicon.com Subject: Re: Character code sets and all that In-Reply-To: <9102011719.AA06276@galt.tots.Logicon.COM> References: <9102011719.AA06276@galt.tots.Logicon.COM> Excerpts from internet.ietf-smtp: 1-Feb-91 Character code sets and all.. tep@tots.logicon.com (1872) > The Andrew folks are always remarking that "our system works", but > everytime I ask about documentation, I get "we are writing RFCs that > will be available Real Soon Now". Are these now available and I missed > them? Um, I'm not sure what you're talking about, but RFC 1049 (Content-type) was the only RFC we ever released. We never (NEVER) proposed the Andrew datastream as a standard (I could give you a zillion reasons NOT to use it), but I do argue that certain aspects of it are worth keeping -- notably the way it accomodates stupid mail gateways rather than insisting that they change. As for "documentation", the Andrew software comes with oceans of it, and there are dozens of technical papers published in the literature. What exactly do you want to know? From owner-ietf-smtp@dimacs.rutgers.edu Fri Feb 1 20:00:31 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA13180; Fri, 1 Feb 91 19:53:42 EST Received: from ucsd.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA13171; Fri, 1 Feb 91 19:53:38 EST Received: from tots.UUCP by ucsd.edu; id AA14425 sendmail 5.64/UCSD-2.1-sun via UUCP Fri, 1 Feb 91 16:41:57 -0800 From: tep@tots.logicon.com Received: from galt by tots.Logicon.COM (3.2/4.07) id AA00318; Fri, 1 Feb 91 14:57:52 PST Received: by galt.tots.Logicon.COM (3.2/4.02-client) id AA06514; Fri, 1 Feb 91 14:54:58 PST Date: Fri, 1 Feb 91 14:54:58 PST Message-Id: <9102012254.AA06514@galt.tots.Logicon.COM> To: nsb@thumper.bellcore.com, ietf-smtp@tots.logicon.com Subject: My faux pas Reply-To: tep@tots.logicon.com X-Organization: Logicon, Inc., San Diego, California Oops, sorry about that. On re-reading my original message, the tone is not at all what I had intended. The message stemmed from my mis-remembering some (very old) e-mail. What the message should have been was: I would like to hear more about existing art. I know that Andrew software mails all kind of interesting multi-media mail all over the place. How do they do it? I was not aware that RFC 1049 was based on (describes?) the Andrew mechanism. Tom Perrine (tep) |Internet: tep@tots.Logicon.COM Logicon |UUCP: sun!suntan!tots!tep Tactical and Training Systems Division | San Diego CA |GENIE: T.PERRINE "Harried: with preschoolers" |+1 619 455 1330 From owner-ietf-smtp@dimacs.rutgers.edu Sat Feb 2 00:30:32 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA20320; Sat, 2 Feb 91 00:24:44 EST Received: from dkuug.dk by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA20285; Sat, 2 Feb 91 00:24:12 EST Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA06716; Sat, 2 Feb 91 06:20:33 +0100 Date: Sat, 2 Feb 91 06:20:33 +0100 From: Keld J|rn Simonsen Message-Id: <9102020520.AA06716@dkuug.dk> To: KLENSIN@infoods.mit.edu Subject: Re: ISO2022, Registrations, and ISO8859-n Cc: Christopher.J.Tanner@nve.crl.aecl.ca, af@sei.ucl.ac.be, ietf-smtp@dimacs.rutgers.edu X-Charset: ASCII X-Char-Esc: 29 John wrote: > Similarly, designating one of the registered sets as G1 and explicitly > designating ASCII as G0 *still* does not make an 8859 part. First of > all, there are several registered charater sets that cannot be combined > with ASCII to make an 8859 part (because 8859-n sections have to be > voted on as "International Standards" and these things merely require > someone to come in with the appropriate forms to register them). Well, it is normally ECMA doing the production of ISO 8859 parts, and they are normally quich in getting the part registered in the 2022 registry (maintained by ECMA...) Which 8859 parts have not got a ECMA registration number, John? > Second, there is no algorithm to map between registration number of > designation sequence and an ISO8859 part number. No, but you could have a list of them. I have such a machine readable list, covered by C language rutines, and with full encoding of almost all of the ECMA registrature. So that can be done. > So an SMTP extension standard would need one or more rules *in > addition to* ISO2022 and the designation sequences in order to use the > ISO2022 *framework* with ISO8859-n character sets. One such rule, and > the one implied by Keld's note, would be > "Use the designator for G1 of the registered character set that is > identical to the GR part of an ISO8859 part to identify that ISO8859 > part. Designators of registered character sets that don't appear as GR > in ISO8859 parts are prohibited". Well, I would do it otherwise: I would recommend specifying the encoding in the header, and then allow for G0/G1/G2/G3 designators in the body. I have code for the header already running in sendmail for about a year now, and the G-designators are to be implemented real soon now... > But that would be an *SMTP* rule, however reasonable and obvious, not > an ISO2022 rule. To the best of my recollection (which is not very good > today) ISO2022 has *no* mechanism for "designating" a complete eight bit > character set (e.g., any of ISO8859-?) except by that type of > circumvention, which is not part of the standard. Well, does that matter that much, the "circumvention"? It is indeed the specified way in the ISO 8859 standard to do it in this way in a 2022 context, as noted in an earlier letter. Keld From owner-ietf-smtp@dimacs.rutgers.edu Sat Feb 2 01:00:32 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21339; Sat, 2 Feb 91 00:49:12 EST Received: from dkuug.dk by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21335; Sat, 2 Feb 91 00:48:56 EST Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA07322; Sat, 2 Feb 91 06:47:35 +0100 Date: Sat, 2 Feb 91 06:47:35 +0100 From: Keld J|rn Simonsen Message-Id: <9102020547.AA07322@dkuug.dk> To: gvaudre@nri.reston.va.us, ietf-smtp@dimacs.rutgers.edu Subject: Re: The Next Step X-Charset: ASCII X-Char-Esc: 29 > 2) We need to define a new standard character set to replace US ASCII. > To answer John Klensin, it is of my own belief that using SMTP to > shift character sets in and out is inappropriate. That is much > more a message format level function. Now, I'm not opposed to > using a character set that itself escape-shifts pages in and out, or a > multi-byte character set that retains ascii compatibility. (whatever > that means) I think this new character set should only be used when negotiated (you wrote that above too). ASCII should still be the default. I would go for 10646 as the new character set. > a) My personal preference is to have a character set that is not limited to > western characters (latin-1), however, if someone makes a good > argument, I may be led to believe that multi-byte character sets > can( and should?) be encoded in the standard 7 or 8 bit character set. > > b) With the 8 bit systems, a standard mechanism should be defined > in SMTP transport to convert into a 7 bit representation without data > loss. This is important. Information loss will prevent bits > systems from ever being used for efficient transport, or > multi-media, and encoded data because 7 bit systems will continue to > exist. I have such a system with mnemonic names for about 1300 letters and other characters and 24000 ideographic characters, covering about 100 character sets including almost all of the ECMA registrature and all the vendor character sets I know of. > I have not heard much discussion on how to do this. From > an "old" UA point of view, most of the conversions I've heard > of that cause no info loss will be totally unintelligible. At > lease I can guess at missing letters in text with info loss. My scheme retains all information with old UAs and MTAs. The scheme is quite readable with old UAs (non-modified). > Current ideas are Rynning's TEX-HEX and ISO 2022 (??). I would > welcome a summary of available encoding technology and ideas. > Implicit in these ideas is a determination on the primary type of use > this system will have, and whether is it optimized for > human-readability, or for efficiency of data transport. Specific > ideas on these trade-offs and encodings are solicited My sheme has several design criteria: 1. Human readability (mnemonic), 2. efficient conversion (most mnemonics are two characters, giving the ability to do conversion va an internal code and a 20 kbyte table), 3. overall applicability (the mnemonics are done in ISO 646 which can be considered as the greatest common subset of the ECMA registry), 4. efficiency of data transport (ASCII is just one byte, others 3 bytes - when negotiated the primary character set is the one agreed upon, say 8859-1 which is just one byte) 5. Easy maintenance (I maintain tables at the backbone and my uucp customers do not have to change anything, they get things delivered in their internal character set IBM CP and all) 6. Transparancy (Old MTAs and UAs need not to be changed) 7. Coexistense (Old and new software can exchange messages and the messages can within the limits of the presentation be given as good exposure as possible. If the hardware is able to present a character, this is done; if not, a uniform display method is used.) 8. generality. (the namings are used for communications, but also for programming languages. The naming scheme is currently included in the recent draft of POSIX.2 shell and utilities, and is proposed for the ISO C language and other ISO language specifications. Keld Simonsen Postmaster Danish Internet backbone From owner-ietf-smtp@dimacs.rutgers.edu Sat Feb 2 01:30:31 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21758; Sat, 2 Feb 91 01:09:22 EST Received: from dkuug.dk by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21694; Sat, 2 Feb 91 01:08:45 EST Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA07764; Sat, 2 Feb 91 07:04:38 +0100 Date: Sat, 2 Feb 91 07:04:38 +0100 From: Keld J|rn Simonsen Message-Id: <9102020604.AA07764@dkuug.dk> To: PIRARD@vm1.ulg.ac.be, af@sei.ucl.ac.be Subject: Re: Re: Character code sets and all that Cc: Christopher.J.Tanner@nve.crl.aecl.ca, ietf-smtp@dimacs.rutgers.edu X-Charset: ASCII X-Char-Esc: 29 > The problem is that many computers have had great success because they > offer 8-bit character sets, but that these codes are incompatible. > Most of the characters are common, but at different code points. > Having each know each other's code to exchange text (and know which is to > translate to the other's) has been recognized and called the N*N problem. > By evidence, a common 8-bit code is needed for data exchange. I think we need to stay with 7 bit for compatibility with old MTA/UAs. When negotiated in SMTP, a single other character set should be possible, namely ISO 10646. > > There are important things to decide before using such a code: > - each translation to a proprietary code must be clearly defined by someone > that should be the manufatcturer, but it appears they fail to do so. I have defined tables for about 100 character sets. > - it must be defined for the 256 code points, invertible and common to all > the protocols. > This is to assure the round trip integrity of the text data. I have defined code points for about 25000 characters, in almost all the above mentioned 100 character sets. They are invertible. > One example. > 1) The RFC says that a mail relay must use an invertible translation when it > stores in another code mail to be relayed. And btw, by extension, this > applies to the user who forwards or replies. > 2) BITNET is a huge relay system. ASCII can come in then go out with EBCDIC > on the way. about 20 EBCDICs are encoded in my work. > 3) As a consequence, all mail gateways must use the same invertible > translation on 7-bit today, 8-bit I hope tomorrow. It need not be the same, but equivalent.. Say if the gateways are capable of doing transmission of 8-bit codes they need not do it in 7-bit, as long as they know what they are doing. > 4) A problem is that translation from ASCII to EBCDIC is loosely > defined. It seems that now more and more gateways aligned on a widely > recognized one, so widely that it's the one IBM themselves ship with > their TCP/IP which is the software acting as mail gateway in many sites. > I hear of less and less complaints about corrupted encoded data. As said, I have 20 EBCDICs encoded. > 5) IBM has recognized ISO 8859/1. They use codes containing the characters > repertoire of version 1, but with different code points, on their > mainframes and their PCs. On the PC, it's CP 850. CP 850 includes about 32 more characters than 8859-1. But 8859-1 is a true subset. > On the mainframes, these codes are called "Country extended code pageS". > Yes indeed, things get complicated because these codes containing all > the ISO-1 characters are different for compatibility with previous > extensions of US EBCDIC, they say (10 of them, with just a few unnoticed > differences affecting ASCII translation!). > > The same indetermination occurs with 5 other PC codes and the Mac (1). > For these, the character set is only "close to" ISO-1. An arbitrary choice > has to be made for what is out of this subset. I have the official 5 IBM CPs and Mac and Roman8 and DEC MCS encoded. > > An unexpected conclusion is thus that it would be vital and extremely useful > that the RFCs contain a repertoire of translation tables! Be they the > manufacturers' or not, but to be used by anyone or any protocol. Well, I have such tables. They could be used for an RFC. keld From owner-ietf-smtp@dimacs.rutgers.edu Sat Feb 2 04:30:37 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24873; Sat, 2 Feb 91 04:10:44 EST Received: from CBROWN.CLAREMONT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24869; Sat, 2 Feb 91 04:10:38 EST Date: Sat, 2 Feb 1991 01:10 PST From: "Ned Freed, Postmaster" Subject: Re: re RFC1154 and such To: Ariel@relay.prime.com Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: X-Envelope-To: ietf-smtp@dimacs.rutgers.edu X-Vms-To: IN%"Ariel@relay.prime.com" X-Vms-Cc: IN%"ietf-smtp@dimacs.rutgers.edu" Robert Ullmann writes: > Several people complain that RFC1154 line counts will cause faults > when lines get added/wrapped/lost. Do any of you have any EXPERIENCE > with a line count failure? Huh? :-) (I don't mean "Have you ever > seen lines wrapped by a mailer." I mean "Have you ever seen it > _cause_ an actual failure.") Usenet news does it all day, every day. > Prime handles about 150,000 messages/day (not counting news, with > ~10K of them external. Weeell, I have to say yes. It happens to me, rather frequently. I develop and maintain a popular mail system called PMDF that's used on lots of VMS machines all over the place. In particular, PMDF is often used on BITNET. BITNET has this funny 80 character per line rule, and it wraps lines that are too long. Now, one of the things I post to our support mailing list is patches to fix various problems. I don't want to get into the gory details of the patches I post and their format, but suffice it to say that the lines are sometimes longer than 80 characters. These patches also depend on line count information. When this happens I have a handy-dandy utility to encapsulate the long lines to get them past this restriction. No big deal. When I'm at home, like now, I use an 80 column terminal. I can see the long lines, I say "aha", and I do the encapsulation. Or I run another little gadget of mine (if the patch is too long to scan manually) that tells me how long the lines are. But at work I use an X Window display. The line lengths of the mail windows are variable. I normally leave them set at 80, but they sometimes come unglued. This means that a visual scan will produce an incorrect result. I then don't encapsulate, the lines wrap, and people have immense trouble applying the patches. Thousands of sites. Hundreds of people. And I feel somewhat dumb. After the first couple of times it happened I semi-automated the procedure, so now (hopefully) it won't happen again. I've since been bitten by wierd tab expansions (yes, some mailers expand tabs, and not always into multiples of 8 -- I've seen multiples of 10, and fixed 4 spaces, and a couple of others) that caused line wraps that I didn't anticipate. Another hack to the procedure. Hopefully _that_ won't happen again either. I now await the next attack on my extremely limited use of line counts. I also see lots of line wrap trauma in USENET news. People posting stuff used to assume that wrapping would not occur. The problem is that lots of this stuff gets gatewayed into other environments (BITNET among others) where line wrapping does happen, and their postings (usually code) are mangled. This is not, specifically, line count related, since the use of line counts is usually gone by the time the wrapping occurs. The result has been a proliferation of encapsulation utilities (I have about 10 on my machine alone) to deal with this problem. If the problem didn't exist, why did people write all these things? Of course, I don't see RFC1154 related line count failures, but that's because I'm not so foolish as to deploy more line-count-based software in anything I write. If RFC1154 is standardized as-is I will implement and live with it, but the measures I'll have to take to deal with BITNET will be pretty extreme. Other implementers won't do this, and you'll start seeing line count failures constantly. We'll then get into the delights of "soft counts" and trying to find mangled boundaries between things. I have better things to do with my time, to be honest. > I have yet to see ONE case (in 3 years) where the lines were > misplaced so as to mess up a count, that did not (also) mangle the > message to the point of un-usability anyway. (I know a certain > X.25 net in Italy that used to drop bytes, but get the CRC correct > anyway! Ouch. :-) Sorry, all this says is your experience is limited, probably in terms of heterogeneity rather than the actual number of machines. My experience is precisely the opposite, limited in terms of machines but way too much experience with network esoterica. Also, if your CRC check is sensitive to line wrapping, then this is no criteria at all! Of course line wrapping will render the message useless then! The point is that _both_ the boundary information _and_ the contents should be wrap-resistant, if possible. > The only real common "failure" is that 8859-1 text gets stripped to > 7-bit ASCII on some paths. Almost always by sendmail, which runs a > protocol that only vaguely resembles SMTP anyway ... :-) Amen. A little sendmail-bashing now and then does make a body feel better, doesn't it? I indulge occasionally myself ;-) Conclusion: I have no problem with RFC1154 except for this use of line counts. Why not replace them with a more reasonable mechanism? You now have evidence that the world _does_ have trouble with line counts, even if you don't. That one change would convert an extremely dangerous specification into a useful one, in my opinion. Ned From owner-ietf-smtp@dimacs.rutgers.edu Sat Feb 2 13:31:49 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA01988; Sat, 2 Feb 91 13:18:38 EST Received: from akbar.cac.washington.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA01984; Sat, 2 Feb 91 13:18:34 EST Received: from tomobiki-cho.cac.washington.edu by akbar.cac.washington.edu (5.65/UW-NDC Revision: 2.21 ) id AA02652; Sat, 2 Feb 91 10:17:57 -0800 Date: Sat, 2 Feb 1991 10:08:16 -0800 (PST) From: Mark Crispin Subject: Re: Re: Character code sets and all that To: Jan Michael Rynning Cc: Mark Crispin , Jan Michael Rynning , ietf-smtp@dimacs.rutgers.edu In-Reply-To: Message-Id: If you got a version of TELNET written by "CRISPIN", that confirms that it was the broken Stanford-SUMEX TELNET, because that was the only system where I had that user name (I'm "MRC" everywhere else). I said that the broken implementation of 8-bits was put into their TELNET over my objections. I didn't say that I never edited or compiled their version for them. This is a general lesson to us all; broken software tends to propagate all over the Internet in spite of best efforts to prevent it!! From owner-ietf-smtp@dimacs.rutgers.edu Mon Feb 4 11:40:33 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24528; Mon, 4 Feb 91 11:08:58 EST Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24524; Mon, 4 Feb 91 11:08:55 EST Message-Id: <9102041608.AA24524@dimacs.rutgers.edu> Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 1251; Mon, 04 Feb 91 11:11:03 EST Received: from BLIULG11 by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 1248; Mon, 04 Feb 91 11:11:02 EST Received: from BLIULG11 by BLIULG11 (Mailer R2.07) with BSMTP id 0409; Mon, 04 Feb 91 17:06:33 +0100 Received: from vm1.ulg.ac.be by vm1.ulg.ac.be (IBM VM SMTP R1.2.2MX) with BSMTP id 0184; Mon, 04 Feb 91 17:06:26 +01 Date: Mon, 04 Feb 91 14:28:08 +0100 From: Andr'e PIRARD Subject: Re: The Next Step To: Greg Vaudreuil , IETF Internet Mail Extensions WG In-Reply-To: Message of Thu, 31 Jan 91 15:41:45 -0500 from On Thu, 31 Jan 91 15:41:45 -0500 you said: >... >Again I'm looking for specific ideas, > > 1) Is a decision to change the SMTP specification to use 8 bit > w/ no line length changes acceptable? As a question about the MTA: - most data processing is 8-bit wide. I think (correct me) that most MTAs do 8-bit transport already, or are just as capable to do so as their hosts do IP transport. - it would be sorry to impose on all UAs to use "strange" encoding techniques of a plain 8-bit character code in order to make for the inability of just a few MTAs. - BITNET gateways translating to and from EBCDIC would need to understand an 8 to 7 bits encoding; plain 8-bit is just a matter of immediate change of their translation table. (This is a problem of MTAs usurping UAs function that will need to be considered for codes wider than 8-bit). - it would be sad for a pair of (already) 8-bit consenting UAs to wait for all MTAs on the way for their data exchange to be effective. - most of the problem may be either avoiding "deficient" MTAs as relays, or upgrading them. Are they robust enough to just garble data instead of fail? I find genuine 8-bit text transport vital. As a question about UAs: - they are more concerned about robustness, but again I think few will fail. - *understanding* the 8-bit code is their own will, but often just a matter of simple byte to byte translation. And apparent garbage will hasten the desire to do so. My general feeling of the problem is that enforcing strong standards to allow 8-bit in an already de-facto-permissive environment would give the feel of working backwards from a situation where it is almost possible to achieve 8-bit mail to a chichen-and-egg one where UA's writers would wait for MTA's ones and vice versa. > 2) A summary(s) of available character sets, as well as an > evaluation of how (if at all) they handle non-western characters. I think the best is ISO8859/1 today, with preparation for ISO 10646 "syntax". As Dan Oscarsson said, 8859/1 is a subset of 10646 (the 1-byte subset whose "1-2-4-syntax" is much like Russian dolls). Were it not UNICODE, it would be the only "evolutionary" conclusion. 8859/1 is criticized, but some do it for a lack of more international characters, others of graphic characters etc... But many use it anyway. A question is: are ISO protocols going to use ISO codes? Note that ISO 8859/1 is also an ANSI and ECMA standard. Sorry I lost a file with a basic description os ISO 10646, but I am forwarding to this list an instructive letter from ECMA to UNICODE. > 3) An encoding mechanism to convert from 7bit to 8 bit systems > with no data loss. Sending image data is difficult with the present transport specifications. Moreover, network rules impose files to be split and reassembled, sometimes. Finally, archiving would be welcome, and all is to be gained from compression. In short, standardizing at level 6 the various techniques used today, so that reassembling, decoding, decompressing, and dearchiving files can be a standard automated function of a UA. Encoding to 7 bits may be a choice here. Problems of code translation by those MTUA has been discussed. Andr'e PIRARD SEGI, Univ. de Li`ege B26 - Sart Tilman B-4000 Li`ege 1 (Belgium) pirard@vm1.ulg.ac.be or PIRARD%BLIULG11.BITNET@CUNYVM.CUNY.EDU From owner-ietf-smtp@dimacs.rutgers.edu Mon Feb 4 12:10:32 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24686; Mon, 4 Feb 91 11:15:03 EST Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA24679; Mon, 4 Feb 91 11:14:59 EST Resent-Message-Id: <9102041614.AA24679@dimacs.rutgers.edu> Message-Id: <9102041614.AA24679@dimacs.rutgers.edu> Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 1366; Mon, 04 Feb 91 11:17:08 EST Received: from BLIULG11 by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 1361; Mon, 04 Feb 91 11:17:05 EST Received: from BLIULG11 by BLIULG11 (Mailer R2.07) with BSMTP id 0430; Mon, 04 Feb 91 17:13:42 +0100 Received: from vm1.ulg.ac.be by vm1.ulg.ac.be (IBM VM SMTP R1.2.2MX) with BSMTP id 0186; Mon, 04 Feb 91 17:13:40 +01 Resent-Date: Mon, 04 Feb 91 17:10:58 +0100 Resent-From: Andr'e PIRARD Resent-To: IETF Internet Mail Extensions WG Received: from Bearn.ac.be by BLIULG11 (Mailer R2.07) with BSMTP id 0604; Fri, 01 Feb 91 08:22:02 +0100 Received: by BEARN (Mailer R2.07) id 4420; Fri, 01 Feb 91 08:20:42 +0100 Date: Thu, 31 Jan 91 08:21:32 PST Reply-To: Multi-byte Code Issues Sender: Multi-byte Code Issues From: Mike Ksar Subject: ECMA letter to Unicode X-To: unicore@Eng.Sun.COM X-Cc: ISO10646@jhuvm.BITNET, Unicode@Sun.COM To: Andr'e Pirard The letter my other mail refers to... ----------------------------Original message---------------------------- > Below is a message that has been received from ECMA TC1 addressed to Asmus which describes the position of ECMA TC1 on Unicode 1.0. Mike > 29th January 1991 > > > > > Sir, > > I have received the document entitled UNICODE 1.0. It was > discussed at the meeting of TC1, the ECMA coding committee. > The views of ECMA in matters of multiple-byte coding are as > follows. > > As a matter of principle ECMA believes strongly that a coded > character set for world-wide multi-lingual applications must > be developed by the appropriate, world-wide recognized > standardization organizations, viz. ISO and IEC. ECMA, as an > A-liaison organization of them, contributes to, participates > actively in, and supports, their work. > > The task of ISO/IEC/JTC1/SC2 and its WG2 is to develop a > universal coded character set which will include "all" > scripts of the world and to provide the coding scheme > necessary to achieve this aim. Whilst WG2 tried first to > achieve this with a 2-byte coding, it was rapidly discovered > that this will not suffice, thus the present structure of > ISO/IEC/DIS 10646. The aim of Unicode is obviously to limit > itself to a 16-bit code and to represent with it as many > graphic characters as possible. > > The two approaches are fundamentally different in aims and > means. Because of this basic difference of approach, the > following aspects inherent to UNICODE do not meet the strict > criteria established for a world-wide, universal coded > character set. > > i) Defined repertoire > > The repertoire, that is the number of characters which > can be represented in coded form by means of the bit > combinations of the code, is undefined. Indeed, the > use of "floating" accents and, in general, the > facility to combine the images of two or more graphic > characters into one single graphic symbol representing > a character not included in the basic coded set yields > a practically infinite or, at best, undefined > repertoire. > > ii) Conformance > > As a consequence of i) it is generally impossible to > define the requirements for conformance. Because of > the possibility of duplicate coding (see iv) below) > the same set of data could be coded in different ways > and these different codings could all satisfy the same > conformance clause for CC-data-elements, which would > be completely against the well established principle > of unique coding. Moreover, the absence of a > defined, finite repertoire makes it generally > impossible to determine the conformance requirements > for a receiving character-imaging device. > > iii) Fixed-length coding > > UNICODE is not a true 16-bit code, since some accented > characters may require 32 or 48 bits for their coded > representation, depending on the number of associated > "non-spacing" diacritical marks. > > It is well known that handling of strings of coded > characters with a different length of coded > representations causes problems, in particular for > programming languages. > > iv) Duplicate coding > > Alternative coded representations are available for > many of the accented characters, since some of them > are coded as single characters, and can also be > represented as a pair of characters using a "non > spacing" diacritical mark. This causes much difficulty > in string-search operations in text processing, and > for key-matching in data bases. > > v) Ideographic characters > > The position of ECMA and of most, if not all, European > National Standards Institutes is that proposals for > the coded representation of ideographic characters > must be the subject of review and approval by the > National Standards Institute of the countries directly > concerned and not imposed by a private consortium. > > A central technique allowing to handle ideographic > characters in Unicode is possible only by unification > of the Han characters. This unification of Han > characters is difficult due to the open nature of > ideographic characters, as their exact number is not > known, new ones can be invented over time. Asian > countries are planning to form a joint research group > to study this matter with academic, cultural and legal > considerations. Only a proposal agreed by them should > be included in an International Standard. > > ISO 10646 uses a code structure allowing the inclusion > of Chinese, Japanese and Korean ideographic characters > as distinct characters. > > A conversion from ISO 10646 to Unicode would cause an > information loss as three distinct characters from ISO > 10646 would be mapped on one single character in > Unicode. Ideographic characters must be displayed with > the appropriate font for each country (by user > preference/demand and by regulations), thus some kind > of local information must be carried. Local > information is also necessary in order to process > characters since character attributes are different. > > For all these reasons we are not supporting a set of > "unified" Han characters outside the private-use > planes of ISO 10646 as long as not supported by all > relevant Asian countries. > > vi) Use of the control functions areas > > Because of its limitation to a 16-bit code table, > Unicode assigns graphic characters in the areas > corresponding to the C0 and C1 sets of control > functions in ISO 2022. This will cause considerable > difficulties with many existing communication systems > and products which assume the code structure of ISO > 2022. The migration of 8-bit systems to multiple-byte > code will, thus, be impaired. > > ISO 6429 is the repertoire of control functions > adopted by ISO. It specifies them not only in terms of > exact definitions but it also allocates a precise > coding. The corresponding bit combinations must be > retained when these control functions will be used in > the multiple-byte environment of ISO 10646. > > ECMA TC1 are preparing a revision of their Standard > ECMA-48 (on which ISO 6429 is based) in which specific > control functions for bi-directional texts and for > text communication will be included. It is essential > that for these particularly sensitive applications no > additional problems arise due to the coding of these > control functions. > > vii) Character naming > > ISO has established a methodology for the generation > of unique names of characters world-wide. This is > needed not only for coherence in the coding work, it > has also been strongly required by other disciplines > such as programming languages. The present scheme has > been discussed and agreed between JTC1/SC2, SC21 and > SC22. The adoption of alternative names will only > cause unnecessary confusion. > > viii) Presentation forms > > Again, because of its inherent limitations as a 16-bit > code, only a minute fraction of the presentation forms > required can be included. In particular, the > requirements for Arabic presentation forms cannot be > satisfied by Unicode. The necessary number of such > presentation forms has been established by the ECMA > Arabic Task Group in co-operation with recognized > experts from Arabic countries and with the National > Standards Institutes of these countries. Further input > was also received from China and the United Kingdom. > It is out of question to reduce this number. > > This latter example, in addition to that of the ideographic > characters, illustrates the need to develop international > standards in the international, recognized standardization > organizations, viz. ISO and IEC and not in a private group with a > participation practically limited to North America. > > The conclusions of the discussion and review by the members > of ECMA TC1 lead to the very firm opinion that : > > - the approach of ISO 10646 is the only right one for a > world-wide universal coded character set, > > - the present structure of ISO 10646 offers the possibility > to allocate planes for private use, thus other coding > schemes like Unicode and/or a private unified set of > ideographic characters could be allocated to such planes. > > ECMA TC1 will continue to participate in and strongly support > the efforts of ISO/IEC/JTC1/SC2 and its WG2 toward the issue > of ISO 10646 and will oppose alternative proposals. ECMA will > contribute to the work for further complements to the first > issue of this International Standard. > > > > Yours faithfully, > > > > > D. Hekimi > Secretary General > > From owner-ietf-smtp@dimacs.rutgers.edu Mon Feb 4 13:40:31 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28506; Mon, 4 Feb 91 13:10:15 EST Received: from akbar.cac.washington.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA28502; Mon, 4 Feb 91 13:10:12 EST Received: from tomobiki-cho.cac.washington.edu by akbar.cac.washington.edu (5.65/UW-NDC Revision: 2.21 ) id AA14569; Mon, 4 Feb 91 10:10:08 -0800 Date: Mon, 4 Feb 1991 9:49:44 -0800 (PST) From: Mark Crispin Sender: Mark Crispin Subject: world character sets To: IETF Internet Mail Extensions WG Message-Id: Friends - There is a further obstacle in the way of a single system for all the world's character sets having to do with East Asian languages. It is not merely an issue of overlap between Chinese, Korean, and Japanese or the inclusion of the phonetic characters used in Korean and Japanese into some superset of the Chinese characters. There are two major and quite incompatible standards for representing Chinese: GB (the national standard of mainland China) and BIG5 (the semi-official standard of Taiwan). GB is a 14-bit encoding that originated as a modified form of the Japanese standard JIS encoding. BIG5 is a 15-bit encoding and was intended to address a number of problems that GB does not, including the representation of classical Chinese. These two systems are totally incompatible. Although there are programs to do a halfway job of converting between the two, a true one-for-one conversion is impossible. Nor, due to the ridiculous attitudes of the two governments in question, is it possible to achieve a single standard that can be agreed to by all sides. I have no idea if the Republic of Korea and north Korea use a compatible representation of Korean or not. Perhaps they do, since north Korea apparently buys mostly Japanese-made PC's which would use ROK standards. In my judgement, any attempt to achieve a global, universal, character set at this time is doomed to failure after bogging down a lot of people for a long time. Although this goal is desirable and will undoubtably be attained in the future, this future is outside of the projected lifespan of SMTP/RFC-822 based mailing vs. ISO mailing (or so the Party Line goes...). Nor do I feel it is feasible to attempt a transition to such a system in the expected lifetime of SMTP/RFC-822 based mail. For this reason, I urge the other members of this group to narrow down the proposed extensions to the immediate problem at hand, and not be swayed by proposals for a all-encompassing future. That's what ISO is supposed to be. My recommendations: 1) a new SMTP facility to identify 8-bit ISO Latin text. 2) if the receiver rejects 8-bit data, send as 7-bit using SI/SO shifts. I am aware of the phase problem, but the Japanese have gotten this to work in their e-mail. The burden of implementation is on the "smart" 8-bit software. 3) Define some cookie in the header to tag 8-bit data that was transmogrified to 7-bit, to assist a smart 8-bit receiver receiving mail from a 7-bit sender. The cookie should provide some sort of sanity-check capability, such as SI/SO shifting within the cookie as a well-defined string. 4) Reform of RFC-1154 to be a separate issue, and absolutely not tied to the 8-bit question. 5) Binary to be a separate issue, perhaps associated with RFC-1154 reform, but absolutely not tied to the 8-bit question. Regards, -- Mark -- From owner-ietf-smtp@dimacs.rutgers.edu Mon Feb 4 15:09:16 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA02377; Mon, 4 Feb 91 14:57:42 EST Received: from corton.inria.fr by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA02368; Mon, 4 Feb 91 14:57:35 EST Received: from [192.44.64.233] by corton.inria.fr (5.65+/90.0.9) via Fnet-EUnet id AA10477; Mon, 4 Feb 91 20:57:03 +0100 (MET) Received: from ulysse.enst.fr (inf) by enst.enst.fr (4.1/SMI-4.0) id AA16922; Mon, 4 Feb 91 20:56:32 +0100 Received: from helios.enst.fr by ulysse.enst.fr (4.1/SMI-4.0-MHS-6.0) id AA02029; Mon, 4 Feb 91 20:56:31 +0100 Date: Mon, 4 Feb 91 20:56:31 +0100 From: philipp@inf.enst.fr (Philippe-Andre Prindeville) Message-Id: <9102041956.AA02029@ulysse.enst.fr> To: MRC@cac.washington.edu Subject: Re: world character sets Cc: Friends - Oh, that's taking liberties... ;-) There is a further obstacle in the way of a single system for all the world's character sets having to do ... Hmm, just one? That's an improvement. In my judgement, any attempt to achieve a global, universal, character set at this time is doomed to failure after bogging down a lot of people for a long time. Although this goal is desirable and will undoubtably be attained in the future, this future is outside of the projected lifespan of SMTP/RFC-822 based mailing vs. ISO mailing (or so the Party Line goes...). Nor do I feel it is feasible to attempt a transition to such a system in the expected lifetime of SMTP/RFC-822 based mail. Thanks Mark, as always, for being the voice of doom. I propose that an standard we adopt will probably outlive its usefulness (ASCII, X.25, and FORTRAN certainly have), and will propose an inertial barrier at a later date. It should be as complete as possible, if it is to serve well during its lifetime. For this reason, I urge the other members of this group to narrow down the proposed extensions to the immediate problem at hand, and not be swayed by proposals for a all-encompassing future. That's what ISO is supposed to be. I view SMTP as being a transport for arbitrary [7-bit, and possibly later 8-bit, text]. As such, it does not concern itself with its contents. It is merely an envelope for delivery. Headers are a convenience for addressesing, manipulation, and end-to-end interpretation. Because headers are for this end-to-end purpose, agreements on an interpretation of such messages should be normal- ized. Contents are completely beyond the scope of a group occupying itself *solely* with transport (ie. MTA) functionality. Perhaps Greg can further delimit the charter of this group: if it is purely SMTP, as the name implies, then the goal is quite simple: to remove the 8bit and 1000+ character line limitation, and to propose transport negotiation options and header indications. Conversely, if it is involved with the *interpretation and labelling of contents*, then this is a separate matter and should be dealt with as such. Indeed, the two efforts should probably not be inter-dependent. My recommendations: 1) a new SMTP facility to identify 8-bit ISO Latin text. That is too restrictive and grossly inadequate (as well as chauvanistic). 2) if the receiver rejects 8-bit data, send as 7-bit using SI/SO shifts. I am aware of the phase problem, but the Japanese have gotten this to work in their e-mail. The burden of implementation is on the "smart" 8-bit software. Some mailers do not like control characters. Further, this has serious device dependent pendants. 3) Define some cookie in the header to tag 8-bit data that was transmogrified to 7-bit, to assist a smart 8-bit receiver receiving mail from a 7-bit sender. The cookie should provide some sort of sanity-check capability, such as SI/SO shifting within the cookie as a well-defined string. Would this be 8bit data or 8bit text? If it is text, 8bits is not enough. Even X windows offers both 8 and 16 bit font capabilities. A person writing mathematics need well over 300 symbols. 4) Reform of RFC-1154 to be a separate issue, and absolutely not tied to the 8-bit question. Well, there was more to it than that. 1154 was very much wrapped up in the issue of maintaining data integrity for Mail Privacy. You don't want your text being manipulated if that will change the checksum. So, a provision was needed to include a cleartext (that the transport could munge as it would) and a ciphertext header, that could be authenticated. 5) Binary to be a separate issue, perhaps associated with RFC-1154 reform, but absolutely not tied to the 8-bit question. It seems you are trying to treat Latin 1 as (a) the universal solution to all problems worth considering and (b) as a flavour of arbitrary 8bit data. Neither of these premises is correct. -Philip From owner-ietf-smtp@dimacs.rutgers.edu Mon Feb 4 15:39:16 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA03110; Mon, 4 Feb 91 15:19:25 EST Received: from akbar.cac.washington.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA03106; Mon, 4 Feb 91 15:19:23 EST Received: from tomobiki-cho.cac.washington.edu by akbar.cac.washington.edu (5.65/UW-NDC Revision: 2.21 ) id AA20954; Mon, 4 Feb 91 12:19:08 -0800 Date: Mon, 4 Feb 1991 12:04:15 -0800 (PST) From: Mark Crispin Sender: Mark Crispin Subject: Re: world character sets To: philipp@inf.enst.fr Cc: IETF Internet Mail Extensions WG In-Reply-To: <9102041956.AA02029@ulysse.enst.fr> Message-Id: Purportedly, we will all be making the transition from a TCP/IP SMTP/RFC-822 based solution to a purely ISO solution in this decade. If this is in fact the case, then it makes little sense to engage in a major redesign of SMTP or RFC-822 based mailing at this point. We should assume, no matter what the results of our efforts, that the bulk of email using this technology will be the old 7-bits using ASCII. I cannot envision my management approving programmer time to design and deploy a major redesign of SMTP/RFC-822 when "X.400 is around the corner." We've adopted RFC-1154 for multi-part mail because it's there, and we're willing to track whatever the current theory is there. But I don't see ourselves going around and changing hundreds of SMTP agents to support a fancy, multi-byte character set, version of SMTP. Indeed, I'm convinced that 7-bit SMTP agents will be with us for the remainder of the lifetime of SMTP; and I'm not convinced that it is unreasonable to design an 8-bit extension so that interoperability continues to be possible. Indeed, as things stand now 14-bit Japanese transports without any difficulty over our extant 7-bit SMTP software; few, if any, changes are required to user agents and no changes are required for mailers. I don't believe there is any disagreement about the separation of role between SMTP and RFC-822. I believe the only confusion is the simultaneous mention of 8-bit mailing along with modifications to RFC-1154. Re: the former. Isn't the present push for 8-bit mailing to allow users of ISO Latin to exchange mail? If so, I urge that we confine our efforts towards solving that problem. Re: the latter. Would it be better to split this working group into two groups, one of which will handle a revision/replacement for RFC-1154 as necessary? I for one would like to see extensions to RFC-1154's functionality as well as the use of some mechanism other than line counts for delimiting message body segments. -- Mark -- From owner-ietf-smtp@dimacs.rutgers.edu Mon Feb 4 16:09:16 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA04206; Mon, 4 Feb 91 15:48:18 EST Received: from corton.inria.fr by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA04196; Mon, 4 Feb 91 15:48:13 EST Received: from [192.44.64.233] by corton.inria.fr (5.65+/90.0.9) via Fnet-EUnet id AA12684; Mon, 4 Feb 91 21:47:33 +0100 (MET) Received: from ulysse.enst.fr (inf) by enst.enst.fr (4.1/SMI-4.0) id AA17052; Mon, 4 Feb 91 21:46:53 +0100 Received: from helios.enst.fr by ulysse.enst.fr (4.1/SMI-4.0-MHS-6.0) id AA02372; Mon, 4 Feb 91 21:46:52 +0100 Date: Mon, 4 Feb 91 21:46:51 +0100 From: philipp@inf.enst.fr (Philippe-Andre Prindeville) Message-Id: <9102042046.AA02372@ulysse.enst.fr> To: MRC@cac.washington.edu Subject: Re: world character sets Cc: Purportedly, we will all be making the transition from a TCP/IP SMTP/RFC-822 based solution to a purely ISO solution in this decade. (Don't bother being ironical with me. I too will believe it when I see it. And the first person to ask me to give up any of the functionality if slowly scraped together over many years, will get a very cold look from me. Whether he be ISO or an "Internet Good Ol' Boy".) If this is in fact the case, then it makes little sense to engage in a major redesign of SMTP or RFC-822 based mailing at this point. It isn't a redisgn of SMTP that I want. It is an agreement for the encoding of NRCs (or preferrably, a subset of all the known ones) on an end-to-end bases. I don't care if SMTP uses baudot to encode my messages, as long as what I send is properly delivered to the remote UA *and* an agreement exists for us to interpret that message. Multi- media and symbol encoding aren't a transport issue, they are a presentation issue. We should assume, no matter what the results of our efforts, that the bulk of email using this technology will be the old 7-bits using ASCII. Yes. But one can encode 16 bit UNICODE or the 14 bit Japanese system (or even 10646) into 2 or 3 ASCII bytes. I cannot envision my management approving programmer time to design and deploy a major redesign of SMTP/RFC-822 when "X.400 is around the corner." We've adopted RFC-1154 for multi-part mail because it's there, and we're willing to track whatever the current theory is there. But I don't see ourselves going around and changing hundreds of SMTP agents to support a fancy, multi-byte character set, version of SMTP. Mark. Relax. It isn't a big deal. I am not (how many times do I have to repeat it) endorsing an MTA (ie. SMTP agent) change. It is with the MESSAGE BODY CONTENTS AS TRANSMITTED IN 7-BIT ASCII that I wish to concern us in. A way of encoding (UA-to-UA) an extensive and hopefully adequate character dictionary. I demonstrated to Keld yesterday that using the standard Bezerkley mailer, one can write a script that does table look and maps escaped tuples of ISO 10646 characters (actually, using his filter) into whatever display output character set you are using (for me, running X, that is Latin 1 or Latin 5). 1 line of script, 1 line of .mailrc, and 2 pages of C was all it took. And a table of character sets, but that's a given. Simple to install, no mail agents disturbed, etc. Only hic is that it's not standardized. And that's what I want us to do... Indeed, I'm convinced that 7-bit SMTP agents will be with us for the remainder of the lifetime of SMTP; and I'm not convinced that it is unreasonable to design an 8-bit extension so that interoperability continues to be possible. For once, we are agreed. Actually, I think we agreed once before, in 1986, something about SUPDUP, but I don't remember exactly... Indeed, as things stand now 14-bit Japanese transports without any difficulty over our extant 7-bit SMTP software; few, if any, changes are required to user agents and no changes are required for mailers. Well, you see how easy it is? We gijeen (sp?) just need to get ourselves as together as the Japanese are. And they don't even use X.400... So what is our excuse? I don't believe there is any disagreement about the separation of role between SMTP and RFC-822. I believe the only confusion is the simultaneous mention of 8-bit mailing along with modifications to RFC-1154. Yes. We should partition ourselves, as I've suggested. Oh, Chair?... If not, the same thing will happen to us that happened to PPP. Re: the former. Isn't the present push for 8-bit mailing to allow users of ISO Latin to exchange mail? If so, I urge that we confine our efforts towards solving that problem. So what would that solve? An ISO Latin1 user would be forced to resort to using ASCII anyway when exchanging mail with an ISO Latin2 user (or the G0 set, which is the same, I know). That's progress? I think 8bit is for data, or more efficient n-octet encoding schemes (ie. n-into-8, rather than n-into-7). Re: the latter. Would it be better to split this working group into two groups, one of which will handle a revision/replacement for RFC-1154 as necessary? I for one would like to see extensions to RFC-1154's functionality as well as the use of some mechanism other than line counts for delimiting message body segments. Yes. For exchanging messages with multiple alphabets in a single line that is difficult (eg. $$ lim sum hard sub x for x from 0 to inf $$). -Philip From owner-ietf-smtp@dimacs.rutgers.edu Mon Feb 4 16:39:15 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA05038; Mon, 4 Feb 91 16:14:24 EST Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA05034; Mon, 4 Feb 91 16:14:22 EST Received: from Relay.Prime.COM by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA07904; Mon, 4 Feb 91 16:14:13 EST Message-Id: <9102042114.AA07904@rutgers.edu> Received: (from user ARIEL) by Relay.Prime.COM; 04 Feb 91 15:57:19 EST Subject: Re: world character sets To: IETF SMTP list From: Robert Ullmann Comment: Created using PRIMAILPLUS Version 1.0 Alpha 5d Date: Mon, 04 Feb 91 15:57:16 EST > From: philipp@inf.enst.fr (Philippe-Andre Prindeville) > Well, there was more to it than that. 1154 was very much wrapped up > in the issue of maintaining data integrity for Mail Privacy. Really? That's news to me ... ! > -Philip Rob From owner-ietf-smtp@dimacs.rutgers.edu Mon Feb 4 18:09:16 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA08553; Mon, 4 Feb 91 18:04:00 EST Received: from corton.inria.fr by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA08549; Mon, 4 Feb 91 18:03:55 EST Received: from [192.44.64.233] by corton.inria.fr (5.65+/90.0.9) via Fnet-EUnet id AA19643; Tue, 5 Feb 91 00:03:37 +0100 (MET) Received: from ulysse.enst.fr (inf) by enst.enst.fr (4.1/SMI-4.0) id AA04297; Tue, 5 Feb 91 00:03:05 +0100 Received: from helios.enst.fr by ulysse.enst.fr (4.1/SMI-4.0-MHS-6.0) id AA03373; Tue, 5 Feb 91 00:03:03 +0100 Date: Tue, 5 Feb 91 00:03:03 +0100 From: philipp@inf.enst.fr (Philippe-Andre Prindeville) Message-Id: <9102042303.AA03373@ulysse.enst.fr> To: ARIEL@relay.prime.com Subject: Re: world character sets Cc: See section 4.8. And RFC 1115... Writing too many RFCs these days, having a hard time keeping them straight? ;-) -Philip From owner-ietf-smtp@dimacs.rutgers.edu Mon Feb 4 23:10:34 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17603; Mon, 4 Feb 91 22:49:27 EST Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17599; Mon, 4 Feb 91 22:49:25 EST Received: from Relay.Prime.COM by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA01792; Mon, 4 Feb 91 22:49:16 EST Message-Id: <9102050349.AA01792@rutgers.edu> Received: (from user DRB) by Relay.Prime.COM; 04 Feb 91 22:52:57 EST To: IETF SMTP list From: David Robinson Organization: Prime Computer, Inc. Systems Integration Subject: In defense of RFC-1154 Date: 04 Feb 91 22:52:57 EST I have been informed (by my coauthor) that it is time I said something about the whys and wherefors of RFC-1154. I will try to gore as few oxen as possible but, in fairness, will try to gore evenly around. Some history and a few confessions. RFC-1154 grew out of a requirement to mail file system objects, including directories of directories. (Some non-SMTP mail package in the company could do that. In order to get everyone to buy in to SMTP, we had to duplicate the functionality.) About the same time, we started looking at X.400. We quickly discovered that everything that made SMTP so easy to implement, so extensible, and so reliable was missing from X.400. In fact, the only thing that seemed to be at all interesting was the ability to handle multi-part, multi- structured messages. So, first confession [and Ox #1, I fear] ... one of the goals behind RFC-1154 was to extend RFC-822 so neither we nor anyone else would ever need to use X.400. Second confession. I don't think line counts are very elegant either. I will, however, explain why they are better than any other options. [Ox #2] Quite frankly, you do have to put something in the header. Otherwise, how can you tell where the boundaries of the message-parts are? (If you put "boundary markers" in the text, how do you know that I didn't happen to put them in my text message.) So, once you have to put something in the header, your choices are line counts, character counts, or some sort of indication that the message contains "boundary markers". We ruled out character counts for 2 reasons. To begin with, some systems do seem to drop or add trailing spaces. Also, the probability of a random transit queue failure dropping/adding a character is signicantly larger than that of dropping/adding a line; there are more of them! But, truth is, it was simplicity (read "laziness") more than anything else. One of the charms of SMTP is the ease of debugging. Have you ever tried to count characters to figure out whether you computed the encoding count correctly? Counting line is much easier! We never really seriously considered "boundary markers". As soon as you introduce them, messages which started out as text begin to get encoded and less readable. And one of our goals was not to have to muck in the message contents of text messages. So all that was left was line counts. Lines are well defined entities in RFC-821. One may argue that mailers that strip the 8th bit are simply rigorously following the RFC. (Although I think it an unproductive interpretation - [Ox #3].) But mailers that do not handle 1000-character lines and/or that add/drop lines are, in fact, broken. They have been broken for over 8 years. Don't justify broken mailers. Fix them! [Ox #4] Now, what about the contents of the separating lines. [Ox #5] There is a great simplicity to separating all the body parts from each other with "apparently blank" lines. You are already doing it between the header (Part 0) and body. Why not all the way through? To be perfectly honest, of course, the user interface (MUA) which I wrote does not bother to check (or to display, for that matter) the contents of the separating lines. (Which may prove either the irrelevance of the contents or the sloppiness of my code.) To my mind, the real work that needs to be done next is to define and publish additional encoding types. We will be doing some work on the file system object encoding and will try to get something out this spring. I would like to see some other people out there work on EDI, PostScript, voice, video, FAX, etc. -David [The opinions expressed above do not represent those of Prime Computer, Inc., etc. ... ] From owner-ietf-smtp@dimacs.rutgers.edu Tue Feb 5 19:12:36 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17328; Tue, 5 Feb 91 18:49:30 EST Received: from CBROWN.CLAREMONT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17324; Tue, 5 Feb 91 18:49:12 EST Date: Tue, 5 Feb 1991 15:48 PST From: "Ned Freed, Postmaster" Subject: Re: In defense of RFC-1154 To: DRB@relay.prime.com Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: <89541858B8C000B8@HMCVAX.CLAREMONT.EDU> X-Envelope-To: ietf-smtp@dimacs.rutgers.edu X-Vms-To: IN%"DRB@relay.prime.com" X-Vms-Cc: IN%"ietf-smtp@dimacs.rutgers.edu" > I have been informed (by my coauthor) that it is time I said something > about the whys and wherefors of RFC-1154. I will try to gore as few oxen > as possible but, in fairness, will try to gore evenly around. > Some history and a few confessions. > RFC-1154 grew out of a requirement to mail file system objects, including > directories of directories. (Some non-SMTP mail package in the company > could do that. In order to get everyone to buy in to SMTP, we had to > duplicate the functionality.) > About the same time, we started looking at X.400. We quickly discovered > that everything that made SMTP so easy to implement, so extensible, and > so reliable was missing from X.400. In fact, the only thing that seemed > to be at all interesting was the ability to handle multi-part, multi- > structured messages. > So, first confession [and Ox #1, I fear] ... one of the goals behind > RFC-1154 was to extend RFC-822 so neither we nor anyone else would ever > need to use X.400. No problem with any of this. > Second confession. I don't think line counts are very elegant either. I > will, however, explain why they are better than any other options. [Ox #2] > Quite frankly, you do have to put something in the header. Otherwise, > how can you tell where the boundaries of the message-parts are? (If you > put "boundary markers" in the text, how do you know that I didn't happen > to put them in my text message.) Sure, you need something in the header. I don't think anyone is against this. The "boundary marker" problem you mention is solved by picking a boundary that is unlikely to be used for any other purpose, and you develop a method for encoding the boundary when you do find it in the actual message text. This sounds a lot more complex than it is; it is quite simple to implement such schemes. In many cases it is simpler to implement than line counts, as a matter of fact. > So, once you have to put something in the header, your choices are line > counts, character counts, or some sort of indication that the message > contains "boundary markers". Actually, the choices are a lot broader than this. You could count paragraphs, or even words (whatever a "word" is -- I don't propose to get into fine semantics here -- I just want to make it clear that there are lots more choices). All approaches, however, boil down to choosing an "in-band" or an "out-band" technique. An "in-band" technique inserts material into the message stream itself, and if it's any good, modifies the stream in such a way that the material it inserted can be unambiguously recognized. It must also be possible to recover the message stream intact, of course, so any changes made must be undoable. In contrast, an "out-band" technique creates the concept of pure message material, which is not altered, and some framing material. In order to distinguish the two, you need an indication of how "long" each chunk of pure message material is. This enables you to find the frames. "Out-band" techniques are without doubt the methods of choice whenever they are feasible. The reason for this is very simple; you don't have to modify the message proper when you use an "out-band" method. But there are two sets of circumstances under which "out-band" techniques must be rejected unconditionally. The first is where initial synchronization is a problem. This doesn't apply to SMTP (look at SLIP if you want to see a situation where it does apply). The second is when the message stream undergoes uncontrollable transformations in transit. Like it or not, this does apply to e-mail, and thus "out-band" techniques are not acceptable. > We ruled out character counts for 2 reasons. To begin with, some systems > do seem to drop or add trailing spaces. Also, the probability of a > random transit queue failure dropping/adding a character is signicantly > larger than that of dropping/adding a line; there are more of them! True enough. > But, truth is, it was simplicity (read "laziness") more than anything > else. One of the charms of SMTP is the ease of debugging. Have you ever > tried to count characters to figure out whether you computed the encoding > count correctly? Counting line is much easier! Actually, neither is especially easy. > We never really seriously considered "boundary markers". As soon as you > introduce them, messages which started out as text begin to get encoded > and less readable. And one of our goals was not to have to muck in the > message contents of text messages. Never considering them was a mistake. The amount of unreadability a good boundary marker scheme introduces is tiny, virtually nonexistent. See RFC934 for the defacto standard boundary marker specification. Do you seriously think this encoding affects readability to any great degree? > So all that was left was line counts. Lines are well defined entities in > RFC-821. One may argue that mailers that strip the 8th bit are simply > rigorously following the RFC. (Although I think it an unproductive > interpretation - [Ox #3].) > But mailers that do not handle 1000-character > lines and/or that add/drop lines are, in fact, broken. They have been > broken for over 8 years. Don't justify broken mailers. Fix them! [Ox #4] Sorry, the specifications don't support you on this. I find nothing in RFC821, RFC822, or RFC1123 that says you cannot wrap lines anywhere, anytime. The fact that you're explcitly *allowed* to play these sorts of games with headers might be construed to mean that you can do similar things to message text. Sure, you're supposed to support 1000 character lines. Even this is not mandatory, but what if your "support" means wrapping them onto multiple 80 character output lines? In addition, as others have noted, it is not a question of fixing broken mailers. It is a question of fixing broken operating systems and networks. These sorts of things are a lot harder to fix, virtually impossible, in fact. Yes, I agree with you that they are broken, but so are a lot of sendmail implementations out there, and I haven't had a lot of luck getting either segment of the population to mend their ways. > Now, what about the contents of the separating lines. [Ox #5] There is > a great simplicity to separating all the body parts from each other with > "apparently blank" lines. You are already doing it between the header > (Part 0) and body. Why not all the way through? > To be perfectly honest, of course, the user interface (MUA) which I wrote > does not bother to check (or to display, for that matter) the contents of > the separating lines. (Which may prove either the irrelevance of the > contents or the sloppiness of my code.) > To my mind, the real work that needs to be done next is to define and > publish additional encoding types. We will be doing some work on the > file system object encoding and will try to get something out this > spring. I would like to see some other people out there work on EDI, > PostScript, voice, video, FAX, etc. Perhaps in your mind this is true. My opinion is that the use of line counts in RFC1154 is a serious flaw that must be rectified before moving on to other RFC1154 enhancements. If it weren't for this glitch you'd have a lot more RFC1154-conformant software out there now (this is the only reason that my MTA software, PMDF, doesn't conform to RFC1154 at present). As I've said before, if the Internet chooses RFC1154 unmodified, I'll bring my MTA software in compliance, and I'll deal with the line wrapping problems too (most software won't, I'm sure). However, during this design discussion I'm going to continue to oppose this aspect of RFC1154, especially since there's such an easy alternative (RFC934). Ned From owner-ietf-smtp@dimacs.rutgers.edu Thu Feb 7 09:39:28 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA14201; Thu, 7 Feb 91 09:22:34 EST Received: from hydra.Helsinki.FI by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA14197; Thu, 7 Feb 91 09:22:24 EST Received: from poros.Helsinki.FI by hydra.Helsinki.FI (4.1/SMI-4.1/32) id AA28888; Thu, 7 Feb 91 16:21:59 +0200 Date: Thu, 7 Feb 91 16:21:59 +0200 From: kankkune@cs.helsinki.fi (Risto Kankkunen) Message-Id: <9102071421.AA28888@hydra.Helsinki.FI> In-Reply-To: Philippe-Andre Prindeville's message as of Feb 4, 20:56 X-Mailer: Mail User's Shell (7.2.0 10/31/90) To: IETF Internet Mail Extensions WG Subject: Let's concentrate on SMTP first Philippe-Andre Prindeville: "Re: world character sets" (Feb 4, 20:56): > I view SMTP as being a transport for arbitrary [7-bit, and possibly > later 8-bit, text]. As such, it does not concern itself with its > contents. It is merely an envelope for delivery. I agree. I'd like to see the SMTP extensions to be thought out before we proceed extending the message formats. Not that these are much interconnected, but the SMTP extensions were the first goal mentioned, and should be very minimal. I'd like the 7-bit and line length restrictions removed. This would allow some sites to use binary transmission to each other. This isn't needed to send binary information (you can encode it to ASCII), but I'd like to see this silly restriction removed, as the transport service SMTP uses doesn't have it. Because these extensions would be negotiated, there isn't any problem talking to old mailers. So, we can specify these extensions to the protocol and let the users (programmers, sys admins) decide, if they want to use them. The changes to the protocol would be minimal, as I understand it. The DATA-command would behave the same for binary or 8-bit-text transmission (period doubling after CFLF). We only need some way to negotiate if the receiver can process 8-bit bytes and/or unbounded-lines. According to the response the sender would use the DATA-section appropriately. There has already been some suggestions about this. My proposions were: FEAT ; what transmission methods do you know? 250-BINARY ; I can receive binary (8-bit/no-line-length-limit) 250 8-BIT-TEXT ; or I can receive 8-bit text (line-length limited) or EXTN ; what extensions do you support? 250-8-BITS ; I can process 8-bit bytes 250 UNLIMITED-LINE ; and I can process lines of unlimited length I think something like this should suffice. I don't think we need to specify any character sets or something like that in the SMTProtocol. The extensions to allow structured mail, extended or multiple character sets and multimedia should be done using message headers. This is the right place for them, and makes it possible to send to old 7-bit mailers, too. Risto -- Risto Kankkunen kankkune@cs.Helsinki.FI (Internet) Department of Computer Science kankkunen@finuh (Bitnet) University of Helsinki, Finland ..!mcsun!uhecs!kankkune (UUCP) From owner-ietf-smtp@dimacs.rutgers.edu Thu Feb 7 10:09:32 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA15013; Thu, 7 Feb 91 09:59:03 EST Received: from thumper.bellcore.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA15009; Thu, 7 Feb 91 09:59:01 EST Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id for ietf-smtp@dimacs.rutgers.edu; Thu, 7 Feb 91 09:57:44 EST Received: by greenbush.bellcore.com (4.12/4.7) id for ietf-smtp@dimacs.rutgers.edu; Thu, 7 Feb 91 09:59:08 est Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Thu, 7 Feb 1991 09:58:51 -0500 (EST) Message-Id: <4bgKgfq0M2Yt8Boy9x@thumper.bellcore.com> Date: Thu, 7 Feb 1991 09:58:51 -0500 (EST) From: Nathaniel Borenstein To: DRB@relay.prime.com Subject: Re: In defense of RFC-1154 Cc: ietf-smtp@dimacs.rutgers.edu In-Reply-To: <89541858B8C000B8@HMCVAX.CLAREMONT.EDU> References: <89541858B8C000B8@HMCVAX.CLAREMONT.EDU> I agree with Ned Freed 100%, but would like to drive one more nail into the coffin: > We never really seriously considered "boundary markers". As soon as you > introduce them, messages which started out as text begin to get encoded > and less readable. And one of our goals was not to have to muck in the > message contents of text messages. Well, here's a straw man for you. Imagine that there was a new standard RFC 1049 content-type called "multipart". It would consist of one or more of the following, in sequence: \begin(foo) ... a message fragment of type "foo" ... \end(foo) Within the content type, a few simple rules would apply: All "\" characters are escaped by themselves ("\\"). All lines are printable ASCII. All non-printable ASCII characters are represented by escape sequences, e.g. \003 (or choose a more compact but less readable format if you prefer, e.g. using base 64 instead of base 8). All newlines are considered real unless the last character on the line was a "\", in which cas the newline is understood to have been inserted just to keep the line short. (Remember that lines that really end with "\" have to end with "\\" anyway, so you can tell the difference based on the parity-of-backslashes, basically.) Obviously this would need to be formalized, but I'm sure it could be, and the result would be EXTREMELY readable for any text that was inherently human-readable to begin with, would not break any gateways, would allow easy encoding of arbitrary (=binary) media types in 7 bit printable ASCII, and would render RFC 1154 entirely unnecessary. Why on Earth shouldn't we take an approach like this one? -- Nathaniel From owner-ietf-smtp@dimacs.rutgers.edu Thu Feb 7 11:09:29 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA16331; Thu, 7 Feb 91 10:58:14 EST Received: from hydra.Helsinki.FI by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA16318; Thu, 7 Feb 91 10:58:04 EST Received: from poros.Helsinki.FI by hydra.Helsinki.FI (4.1/SMI-4.1/32) id AA02060; Thu, 7 Feb 91 17:57:56 +0200 Date: Thu, 7 Feb 91 17:57:56 +0200 From: kankkune@cs.helsinki.fi (Risto Kankkunen) Message-Id: <9102071557.AA02060@hydra.Helsinki.FI> In-Reply-To: Nathaniel Borenstein's message as of Feb 7, 9:58 X-Mailer: Mail User's Shell (7.2.0 10/31/90) To: ietf-smtp@dimacs.rutgers.edu Subject: Re: In defense of RFC-1154 Nathaniel Borenstein: "Re: In defense of RFC-1154" (Feb 7, 9:58): > Well, here's a straw man for you. Imagine that there was a new standard > RFC 1049 content-type called "multipart". It would consist of one or > more of the following, in sequence: > > \begin(foo) > ... a message fragment of type "foo" ... > \end(foo) I like this approach very much and it can be made quite robust to gateway-munging. Maybe we shouldn't try to extend RFC 1154 as it basically describes the line-count method, which many people have found too unreliable. I think it would be better to refine RFC 1049 and specify some new content types like Nathaniel suggests. Risto -- Risto Kankkunen kankkune@cs.Helsinki.FI (Internet) Department of Computer Science kankkunen@finuh (Bitnet) University of Helsinki, Finland ..!mcsun!uhecs!kankkune (UUCP) From owner-ietf-smtp@dimacs.rutgers.edu Thu Feb 7 12:09:58 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17683; Thu, 7 Feb 91 11:51:28 EST Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17541; Thu, 7 Feb 91 11:47:16 EST Received: from Relay.Prime.COM by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA13050; Thu, 7 Feb 91 11:41:56 EST Message-Id: <9102071641.AA13050@rutgers.edu> Received: (from user ARIEL) by Relay.Prime.COM; 07 Feb 91 11:43:55 EST To: IETF SMTP list From: Robert Ullmann Subject: Re: Ned Freed comments Date: 07 Feb 91 11:43:55 EST Hi, Sorry, but I just couldn't let this one go by: > (Ned Freed) > > (David Robinson) > > But mailers that do not handle 1000-character > > lines and/or that add/drop lines are, in fact, broken. They have been > > broken for over 8 years. Don't justify broken mailers. Fix them! [Ox #4] > Sorry, the specifications don't support you on this. from rfc821 QUOTE: " 4.5.3. SIZES There are several objects that have required minimum maximum sizes. That is, every implementation must be able to receive objects of at least these sizes, but must not send objects larger than these sizes. [...] text line The maximum total length of a text line including the is 1000 characters (but not counting the leading dot duplicated for transparency)." Note: it says min max; with emphasis added: "EVERY implementation MUST be able to recieve objects of at least these sizes ..." From the previous section, QUOTE: " In some systems it may be necessary to transform the data as it is received and stored. This may be necessary for hosts that use a different character set than ASCII as their local character set, or that store data in records rather than strings. If such transforms are necessary, they must be reversible -- especially if such transforms are applied to mail being relayed." I.e you MUST be able to receive lines of 1000 chars, and you MUST relay them in the same form. (the transform MUST be reversible). If you are wrapping lines (at less than 1000; more isn't defined), you ain't doing SMTP. If you can't handle 1000 character lines, your mailer is BROKEN, and has been since August 1982. Best Regards, Rob Ullmann postmaster@relay.prime.com From owner-ietf-smtp@dimacs.rutgers.edu Thu Feb 7 12:39:29 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA18223; Thu, 7 Feb 91 12:09:24 EST Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA18021; Thu, 7 Feb 91 12:04:19 EST Received: from Relay.Prime.COM by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA13768; Thu, 7 Feb 91 11:56:19 EST Message-Id: <9102071656.AA13768@rutgers.edu> Received: (from user ARIEL) by Relay.Prime.COM; 07 Feb 91 11:59:23 EST To: IETF SMTP list From: Robert Ullmann Subject: re: Nathaniels comments Date: 07 Feb 91 11:59:23 EST Hi, > (Nathaniel Borenstein) > \begin(foo) > ... a message fragment of type "foo" ... > \end(foo) > Within the content type, a few simple rules would apply: All "\" > characters are escaped by themselves ("\\"). All lines are printable > ASCII. All non-printable ASCII characters are represented by escape Perhaps you'd care to comment on what happen when something is included (i.e. recursively encapsulated) 5 or 6 times? This is _not_ an uncommon experience. [for the benefit of those who don't want to figure it out, the character v (o-diaersis, you probably see 'v' because dimacs will clear the high bit), will appear as \\366 on the first round, and \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\366 on the sixth ... Imagine a short text in, say, Swedish, after it was copy-prepended a few times ... ] ---- We put a LOT of thought into this, and it is hard to beat the idea that there is NO transform applied to the text, just an out-of-band count of the "session data units" (read: "lines" :-) used in the transport protocol. ---- BTW: what is the Content-Type/Encoding keyword used by an "Andrew" object? Can you send us one? (Or does your mailer "know" that dimacs.rutgers.edu can't handle it? :-) Rob From owner-ietf-smtp@dimacs.rutgers.edu Thu Feb 7 13:09:29 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA19367; Thu, 7 Feb 91 12:42:00 EST Received: from CBROWN.CLAREMONT.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA19359; Thu, 7 Feb 91 12:41:55 EST Date: Thu, 7 Feb 1991 09:41 PST From: "Ned Freed, Postmaster" Subject: Re: Ned Freed comments To: Ariel@relay.prime.com Cc: ietf-smtp@dimacs.rutgers.edu Message-Id: X-Envelope-To: ietf-smtp@dimacs.rutgers.edu X-Vms-To: IN%"Ariel@relay.prime.com" X-Vms-Cc: IN%"ietf-smtp@dimacs.rutgers.edu" I stand corrected. Looks like a huge number of mailers in the world don't conform to standard. They haven't conformed since 1982. They likely will never conform, since it is an operating system limitation, not a mailer limitation. Now, if you care to fight to get them fixed, be my guest. I'll be happy to provide you with the addresses of the various developers and maintainers as needed. I wish you the very best of luck. If you have a million times the luck I've had, you won't get anywhere. They'll simply claim conformance to RFC822 but not RFC821, or adopt some similar position, and that'll be that. And mail will continue to be ruined by these mailers, when a simple change to the specification would have fixed the problem. Ned From owner-ietf-smtp@dimacs.rutgers.edu Thu Feb 7 13:15:26 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA19290; Thu, 7 Feb 91 12:38:54 EST Received: from rutvm1.rutgers.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA19283; Thu, 7 Feb 91 12:38:51 EST Received: from RUTVM1.RUTGERS.EDU by RutVM1.Rutgers.Edu (IBM VM SMTP R1.2.1MX) with BSMTP id 2816; Thu, 07 Feb 91 12:40:52 EST Received: from DBNGMD21.BITNET by RUTVM1.RUTGERS.EDU (Mailer R2.07) with BSMTP id 2815; Thu, 07 Feb 91 12:40:51 EST Message-Id: <"91-02-07-18:38:34.20*GRZ027"> Date: Thu, 07 Feb 91 18:38 To: ietf-smtp@dimacs.rutgers.edu From: Peter Sylvester +33 1 69823973 Subject: Re: In defense of RFC-1154 > > Nathaniel Borenstein: "Re: In defense of RFC-1154" (Feb 7, 9:58): > > Well, here's a straw man for you. Imagine that there was a new standard > > RFC 1049 content-type called "multipart". It would consist of one or > > more of the following, in sequence: > > > > \begin(foo) > > ... a message fragment of type "foo" ... > > \end(foo) > > I like this approach very much and it can be made quite robust to > gateway-munging. Maybe we shouldn't try to extend RFC 1154 as it > basically describes the line-count method, which many people have found > too unreliable. I think it would be better to refine RFC 1049 and > specify some new content types like Nathaniel suggests. > > Risto BEGIN(sentence) It's a solution that fits perfectly into the brain circuits of a computer science person who has heard a little bit about structured programming. But some e-mail user are sopposed to be real humans. BEGIN(disclaimer) No harm is intended :-) END(disclaimer) END(sentenc) *** unexpected token (include some of you favorite latex error messages here) BEGINNNE(Feuer) Text and structure are two different things, and they should exist at two conceptually different layers like ASN.1 at one level and an X400 body type on another level. Does someone argue that the benefit of RFC822 and SMTP is that it is human readable? It isn't, you just have a good tool, a text editor and a terminal that show you some dots on a screen that look like what you have learned at school. IMHO one should concentrate on writing some tools that allow to manipulate structured objects, and multitype/part objects in a nice way, and then use X400 either directly or at least as a guideline for packing these things into one mail. If I write a snail mail letter to you and say: Attached please find the handbook for our new software, then you do not expect that the handbook is photocopied or printed after the signature of that letter and all together on a piece of endless paper. ENDE(Flamme) RFC1049 says: BEGIN(cite) 2. The mail transport service negotiates with the receiving system as to its capabilities. If the receiving system cannot support the specified content type, the mail is transformed into conventional ASCII before transmission. END(cite) BEGINCOMMENT The problem is that there are data that cannot be transformed into ASCII without loosing essential information. I want attach a bitmap of a nice picture or a fax to a letter. and my face should not end as a :-( or 8-) ENDRFC Peter Sylvester -- EARN Office Paris From owner-ietf-smtp@dimacs.rutgers.edu Thu Feb 7 13:39:29 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA20458; Thu, 7 Feb 91 13:15:42 EST Received: from RUTGERS.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA20367; Thu, 7 Feb 91 13:11:59 EST Received: from Relay.Prime.COM by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA17481; Thu, 7 Feb 91 13:04:18 EST Message-Id: <9102071804.AA17481@rutgers.edu> Received: (from user DRB) by Relay.Prime.COM; 07 Feb 91 13:07:35 EST To: IETF SMTP list From: David Robinson Organization: Prime Computer, Inc. Systems Integration Subject: Further comments on RFC-1154 line-counts Date: 07 Feb 91 13:07:35 EST I will repeat and expand. Line-counts are an extremely reliable mechanism provided your mailer is in compliance with an 8-year old standard (and most are in compliance.) Perhaps Jon Postel or Dave Crocker would like to comment, but I find it very difficult to believe that the intention of RFCs 8221 and 822 was to permit/encourage intermediate mailers to arbitrarily fold, spindle, and/or mutilate messages. Indeed, there are many notes to the effect that the only changes to messages should be header lines added to the front of the message. The major mailer that has trouble and folds lines is BITNET. BITNET is NOT an SMTP or RFC-822 compliant mail network. And besides, the BITNET is moving off RSCS and onto TCP/IP as fast as possible. On the subject of putting "markers" in the text, I can only repeat that one of the goals of RFC-1154 was not to muck up otherwise human-readable message text, to wit Nathaniel's sample message... \begin(text) \\begin(foo) ... a message fragment of type "foo" ... \\end(foo) Within the content type, a few simple rules would apply: All "\\" characters are escaped by themselves ("\\\\"). All lines are printable ASCII. All non-printable ASCII characters are represented by escape sequences, e.g. \\003 (or choose a more compact but less readable format if you prefer, e.g. using base 64 instead of base 8). All newlines are considered real unless the last character on the line was a "\\", in which cas the newline is understood to have been inserted just to keep the line short. (Remember that lines that really end with "\\" have to end with "\\\\" anyway, so you can tell the difference based on the parity-of-backslashes, basically.) Obviously this would need to be formalized, but I'm sure it could be, and the result would be EXTREMELY readable for any text that was inherently human-readable to begin with, would not break any gateways, would allow easy encoding of arbitrary (=binary) media types in 7 bit printable ASCII, and would render RFC 1154 entirely unnecessary. Why on Earth shouldn't we take an approach like this one? -- Nathaniel \end(text) From owner-ietf-smtp@dimacs.rutgers.edu Thu Feb 7 14:09:28 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21836; Thu, 7 Feb 91 13:55:31 EST Received: from thumper.bellcore.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21831; Thu, 7 Feb 91 13:55:27 EST Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id for ietf-smtp@dimacs.rutgers.edu; Thu, 7 Feb 91 13:55:20 EST Received: by greenbush.bellcore.com (4.12/4.7) id for ietf-smtp@dimacs.rutgers.edu; Thu, 7 Feb 91 13:58:00 est Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Thu, 7 Feb 1991 13:57:57 -0500 (EST) Message-Id: Date: Thu, 7 Feb 1991 13:57:57 -0500 (EST) From: Nathaniel Borenstein To: ietf-smtp@dimacs.rutgers.edu Subject: Re: In defense of RFC-1154 In-Reply-To: <"91-02-07-18:38:34.20*GRZ027"> References: <"91-02-07-18:38:34.20*GRZ027"> I don't understand the problem here at all. If you're sending text, none of this applies anyway. If you're sending something richer than text, of COURSE it has to have some sort of structure. My proposal simply adds a VERY SMALL amount of meta-structure for compound messages built out of several types of object. How on Earth could you do this without any structure? From owner-ietf-smtp@dimacs.rutgers.edu Thu Feb 7 14:39:29 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21713; Thu, 7 Feb 91 13:51:18 EST Received: from hydra.Helsinki.FI by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA21704; Thu, 7 Feb 91 13:51:08 EST Received: from skyros.Helsinki.FI by hydra.Helsinki.FI (4.1/SMI-4.1/32) id AA06393; Thu, 7 Feb 91 20:51:01 +0200 Date: Thu, 7 Feb 91 20:51:01 +0200 From: kankkune@cs.helsinki.fi (Risto Kankkunen) Message-Id: <9102071851.AA06393@hydra.Helsinki.FI> In-Reply-To: Robert Ullmann's message as of Feb 7, 11:59 X-Mailer: Mail User's Shell (7.2.0 10/31/90) To: IETF SMTP list Subject: re: Nathaniels comments Robert Ullmann: > > Within the content type, a few simple rules would apply: All "\" > > characters are escaped by themselves ("\\"). All lines are printable > > ASCII. All non-printable ASCII characters are represented by escape > > Perhaps you'd care to comment on what happen when something is > included (i.e. recursively encapsulated) 5 or 6 times? This is > _not_ an uncommon experience. Maybe you can give us an example. As I understand it, RFC 1154 doesn't even have a way to recursively include body parts, only to catenate. How can you manage with RFC 1154 then? And you don't have to take Nathaniel's suggestion as the only alternative. You could double only the \begin and \end lines, for example. If a line begins with \begin, you double it to \begin\begin. > We put a LOT of thought into this, and it is hard to beat the > idea that there is NO transform applied to the text, just an > out-of-band count of the "session data units" (read: "lines" :-) > used in the transport protocol. David Robinson (04 Feb 91): > We never really seriously considered "boundary markers". Maybe this isn't the same we, or you can put a lot of thought into the matter without considering it? Risto -- Risto Kankkunen kankkune@cs.Helsinki.FI (Internet) Department of Computer Science kankkunen@finuh (Bitnet) University of Helsinki, Finland ..!mcsun!uhecs!kankkune (UUCP) From owner-ietf-smtp@dimacs.rutgers.edu Thu Feb 7 14:53:32 1991 Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA22403; Thu, 7 Feb 91 14:08:17 EST Received: from hydra.Helsinki.FI by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) id AA22385; Thu, 7 Feb 91 14:08:04 EST Received: from skyros.Helsinki.FI by hydra.Helsinki.FI (4.1/SMI-4.1/32) id AA06711; Thu, 7 Feb 91 21:07:51 +0200 Date: Thu, 7 Feb 91 21:07:51 +0200 From: kankkune@cs.helsinki.fi (Risto Kankkunen) Message-Id: <9102071907.AA06711@hydra.Helsinki.FI> In-Reply-To: Robert Ullmann's message as of Feb 7, 11:43 X-Mailer: Mail User's Shell (7.2.0 10/31/90) To: ietf-smtp@dimacs.rutgers.edu Subject: Re: Ned Freed comments Robert Ullmann: > Sorry, but I just couldn't let this one go by: > > > (Ned Freed) > > > (David Robinson) > > > But mailers that do not handle 1000-character > > > lines and/or that add/drop lines are, in fact, broken. They have been > > > broken for over 8 years. Don't justify broken mailers. Fix them! [Ox #4] > > > Sorry, the specifications don't support you on this. > > If you are wrapping lines (at less than 1000; more isn't defined), > you ain't doing SMTP. Yes. > If you can't handle 1000 character lines, your mailer is BROKEN, and > has been since August 1982. Does a mailer necessarily have to speak SMTP to transfer RFC822 messages? I would imagine that some other transfer protocol is used inside BITNET. A