Some syntactic bits

From: Charles Lindsey (chl@clerew.man.ac.uk)
Date: Thu May 08 2003 - 14:47:34 CDT


Whilst looking into splitting Section 5 for USEAGE, I spotted a few
syntactic issues that still required attention. And I also thought it
would be better to move some of the special syntax that distinguishes us
from RFC 2822 out of the particular headers involved and into section
2.4.2 (since most of those differences affect more than one header, and
it seemed better to have all such syntactic oddities in one place). And
it seemed easier to do all this before the USEAGE split.

The actual changes to the syntax fixed at this time are:

msg-id now prohibits ">" anywhere within it (this is essential to be
consistent with the NNTPEXT draft, and with current practice).

I allow "From: Joe Q. Public <joe@public.example>" on a MUST accept
SHOULD NOT generate yet basis, to bring us in line with RFC 2822 (Bruce
raised this some while back, and the idea seemed to be agreed).

So here is the new 2.4.2. Most of the new texts have been removed from
their original places, so the overall size is much as it was.

2.4.2. Syntax adapted from Email and MIME

   Much of the syntax of Netnews Articles is based on the corresponding
   syntax defined in [RFC 2822]. Therefore, wherever in this standard
   the syntax is stated to be taken from [RFC 2822], it is to be
   understood, unless explicitly stated to the contrary, as the syntax
   defined by [RFC 2822], but NOT including any syntax defined in
   section 4 ("Obsolete syntax") of [RFC 2822]. Software compliant with
   this standard MUST NOT generate any of the syntactic forms defined in
   that Obsolete Syntax, although it MAY accept such syntactic forms.

   Likewise, certain syntax from the MIME specifications [RFC 2045] et
   seq is also considered to have been incorporated into this standard
   (see 6.21).

   However, there are some differences arising from some special
   requirements of Netnews, and the following syntactic rules therefore
   supersede the corresponding rules given in [RFC 2822].

        NOTE: Netnews parsers historically have been much less
        permissive than Email parsers, and this is reflected in the
        modifications referred to, and in some further specific rules.

   In contradistinction to [RFC 2822], an unstructured header (e.g. a
   Subject-header) MUST contain at least one non-whitespace character
   (see also remarks about empty headers in 4.2.6).
 
      unstructured = 1*( [FWS] ( utext / encoded-word ) ) [FWS]

   Extended-phrases (known somewhat confusingly in [RFC 2822] as obs-
   phrases) are introduced to allow headers such as
      From: Joe Q. Public <joe@public.example>
   without the necessity of using a quoted-string. They MUST be accepted
   by compliant software, but they SHOULD NOT be generated until
   software capable of accepting them has become widely deployed.

      phrase = 1*( [CFWS] encoded-word [CFWS] / word ) /
                        extended-phrase
      extended-phrase = ( [CFWS] encoded-word [CFWS] / word )
                           *( [CFWS] encoded-word [CFWS] / word /
                              [CFWS] "." [CFWS] )
[ [RFC 2822] had
      obs-phrase = word *( word / "." / CFWS )
Please can Pete check that what I have is equivalent?]

   Within a date-time, two of the obs-zones from [RFC 2822] are retained
   because of current widespread usage.

      zone = (( "+" / "-" ) 4DIGIT) / "UT" / "GMT"

   The forms "UT" and "GMT" (indicating universal time) are to be
   regarded as obsolete synonyms for "+0000". They MUST be accepted, and
   passed on unchanged, by all agents, but they MUST NOT be generated as
   part of new articles by posting and injecting agents.

   Msg-ids are redefined to be a "normalized" subset of those defined by
   [RFC 2822], ensuring that no string of characters is quoted unless
   strictly necessary (it must contain at least one mqspecial) and no
   single character is prefixed by a "\" in the form of a quoted-pair
   unless strictly necessary, and moreover there is no possibility for
   ">" or WSP to occur inside a msg-id, whether quoted or not. Thus,
   whereas under [RFC 2822]
      <abcd@example.com>
      <"abcd"@example.com>
      <"ab\cd"@example.com>
   would be considered semantically equivalent, only the first of them
   is syntactically permitted by this standard, and hence a simple
   comparison of octets will always suffice to determine the identity of
   two msg-ids.

      msg-id = "<" id-left "@" id-right ">"
      id-left = dot-atom-text / no-fold-quote
      id-right = dot-atom-text / no-fold-literal
      no-fold-quote = DQUOTE
                           *( mqtext / "\\" / "\" DQUOTE )
                           mqspecial
                           *( mqtext / "\\" / "\" DQUOTE )
                           DQUOTE
      mqtext = NO-WS-CTL / ; all of <text> except
                        %d33 / ; SP, HTAB, "\", ">"
                        %d35-61 / ; and DQUOTE
                        %d63-91 /
                        %d93-126
      mqspecial = "(" / ")" / ; same as specials except
                        "<" / ; "\" and DQUOTE quoted
                        "[" / "]" / ; and ">" omitted
                        ":" / ";" /
                        "@" / "\\" /
                        "," / "." /
                        "\" DQUOTE
      no-fold-literal = "[" *( mdtext / "\[" / "\]" / "\\" ) "]"
      mdtext = NO-WS-CTL / ; Non white space controls
                        %d33-61 / ; The rest of the US-ASCII
                        %d63-90 / ; characters not including
                        %d94-126 ; ">", "[", "]", or "\"

Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: chl@clerew.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5




This archive was generated by hypermail 2.1.7.