STRICT EMAIL COMPLIANCE

New Message Reply About this list Date view Thread view Subject view Author view

From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Fri Dec 20 2002 - 15:32:01 CST


We have decided, in our straw poll, to

d) Include explicit requirements on how to transform headers (for
   posted-and-mailed and moderators, not necessarily for other gateways)
   so as to be fully compliant with the email standards (that would
   include headers in body parts of multiparts, and so recursively).

So my first move has been to write a new section

8.8.1.1. Gatewaying into email

This will replace a great chunk of text currently in 8.8.1 (virtually restoring
8.8.1 to how it was originally written.

After that, there will be many consequential changes, MUSTs in places where
there are presently none, and cross references to be changed to the new section.
However, I think we need to be agreed on the new section first. Here it is.

8.8.1.1. Gatewaying into email

   Although headers containing non-ASCII characters may well be conveyed
   intact by many (if not most) current mail transport agents, that
   ability is not a requirement of some transport protocols, notably of
   SMTP [RFC 2821]. Likewise, although many mail user agents may
   currently display (or be configurable to display) such headers
   correctly, or at least adequately, messages containing such headers
   are not compliant with the current Email standards, notably with [RFC
   2822]. Note that non-ASCII body part headers [RFC 2046] (including
   non-ASCII headers of a message/rfc822) are equally at variance with
   the current Email standards.

   If, at some future time, the Email standards should be updated so as
   to allow such headers, it would then become possible to transport
   Netnews articles containing them over Email without further ado.
   Until such a time, however, if a Netnews article is to be gatewayed
   into Email with the intention that it be received and accepted by any
   arbitrarily chosen destination, and if it contains any UTF8-xtra-char
   in any of its headers or body part headers, then it MUST first be
   transformed so as to conform to [RFC 2822] and/or [RFC 2046]. In
   particular, articles emailed to moderators (8.2.2) MUST be so
   transformed.

        NOTE: It is not precluded that a gatewayer who knows, or is able
        to control, the capabilities of the particular sites for which
        an article is destined and of the transport paths leading to
        those sites, may choose to send the article without
        transformation, or at least without transformation of any
        contained body part headers.

   The surest way to transport an article containing non-ASCII headers
   through Email is by encapsulation as an application/news-transmission
   (6.21.6.1). However this method is not currently available for
   sending to moderators for reason explained in section 8.2.2 step 12.
   Until this method is considered safe to use, therefore,
   transformation of those headers will be necessary. This can be
   accomplished in the following steps:

   1. If the header is unstructured, or is an experimental header
      (4.2.5.1), any word(s) which is delimited by FWS or by the
      start/end of the header-content is encoded according to [RFC
      2047].

   2. If the header is unstructured, any word(s) which is contained
      within a comment and is delimited by FWS or by the "(" or ")"
      delimiting that comment is encoded according to [RFC 2047], and
      likewise any word(s) which is contained within a phrase and is
      delimited by FWS or by the start/end of the header-content.

   3. If the header contains a (MIME-style) parameter with a non-ASCII
      value, the whole parameter is encoded according to [RFC 2231].

   4. If the header is a Newsgroups-header or a Followup-To-header (or
      any other header that contains a newsgroup-name), each newsgroup-
      name is encoded according to section 5.5.2. Even if it is not
      decoded at the far end, it is preferable to display that encoded
      form than to display nothing at all. Note, however, that such
      encoded newsgroup-names MUST be restored to their canonical form
      before reinjection into any Netnews system.

   5. If the header is not one defined by this standard or by any Email
      standard known to the gateway (so that it cannot be determined
      whether it is unstructured, or otherwise where comments and
      phrases occur within it), then it is not possible to encode it
      according to a strict interpretation of [RFC 2047]. Nevertheless,
      it is preferable to attempt an encoding than to discard that
      header or to allow the gatewaying to fail. It is therefore
      suggested that, outside of regions contained within properly
      matched DQUOTEs, <...> or [...], any word(s) contained within
      properly nested "(" and ")" be treated as being within a comment
      and any other word(s) be treated as being within a phrase.

      Likewise, following any ";", anything of the syntactic form of a
      parameter should be treated as such.

   In all cases, there are additional restrictions imposed by [RFC 2047]
   regarding the size, placement and contents of encoded-words which
   MUST be observed. Moreover, these transformations MUST be applied
   both within the header of the article and within any body part
   headers (including the headers of any message/rfc822). It is
   generally preferable for encodings to use the charset UTF-8, although
   it might be wise first to confirm that that is indeed the charset
   which had been used (see 4.4.1).

Because I may have misunderstood some features of RFC 2047/2231, I have
also posted that text to the ietf-822 mailing lists, so that the RFC
2047 experts there can check it.

For comparison, here is the chunk opf the present 8.8.1 that is to go:

   It is not the purpose of this standard to set requirements to be
   followed by implementors of outgoing gateways. Those implementors are
   in the best position to know the capabilities of the systems to which
   the article is to be sent, the purposes for which it is being sent,
   and the extent to which those purposes will be vitiated if the
   content of some header is mutilated en route, or fails to display
   correctly upon arrival; this is a matter for their judgement.
   Nevertheless, it is useful to draw attention to a few transformations
   which such implementors might find useful.

    o Transporting headers containing non-ASCII characters without first
      encoding them is contrary to the current Email standards [RFC
      2821] and [RFC 2822]. This applies both to the top-level headers
      of the email, and also to headers contained within any embedded
      message or multipart Content-Types (and so recursively). However,
      it is well known that most mail transport agents will in fact
      convey these characters intact, especially for non-top-level
      headers in the case of transports which support the 8BITMIME
      extension, and it is to be expected that the prevalence of this
      ability will increase in the future (and may even be compliant
      with future versions of the Email standards). Moreover, many mail
      user agents will also display such characters correctly, or at
      least adequately. Therefore, some implementors of gateways may
      consider it an acceptable risk not to transform these headers in
      any way, especially in the case of the lower-level ones.

        NOTE: It is not the purpose of this standard either to condemn
        or to condone behaviours which may be non-compliant with other
        standards. That is a matter for those implementors.

    o Where an implementor considers the risk too high for the top-level
      headers, encapsulating the whole article as a message/rfc822
      (6.21.2.2) may make it less likely to be mutilated during
      transport, especially where 8BITMIME is supported. Alternatively,
      encapsulating as an application/news-transmission (6.21.6.1) will
      guarantee correct transmission in all cases and is the method of
      choice where the intent is to gateway it back into Netnews later
      on.
    o To ensure full compliance with the Email standards it is necessary
      to encode unstructured headers, phrases, comments and parameters
      containing UTF8-xtra-chars according to [RFC 2047] or [RFC 2231],
      as set out in section 4.4.1. It is preferable to encode using the
      charset UTF-8, although it might be wise first to confirm that
      that is indeed the charset which had been used (see 4.4.1).
    o In the case of newsgroup-names, as found in Newsgroups-headers,
      Followup-To-headers and some Control-headers, [RFC 2047] is not
      applicable (even though some mail reading agents might
      nevertheless display it correctly). Therefore, it is necessary to
      use the encoding described in section 5.5.2. Even if it is not
      decoded at the far end, it is preferable to display such an
      encoded form than to display nothing at all. Note, however, that
      such encoded newsgroup-names MUST be restored to their canonical
      form before reinjection into any Netnews system.
[We need a paragraph on MIME-style parameters here]

Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.