Draft section_5.02.03

New Message Reply About this list Date view Thread view Subject view Author view

From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Thu Aug 12 1999 - 13:06:37 CDT


I am now back in normal operation, and have made some progress on
section_6. But in the meantime, to keep you happy, here is section_5.
Most of the changes arise directly from discussions on this list. I
still need to know, however, whether we will permit path-identities to
include '[' and ']'. I no longer encourage their use around IP addresses
(Russ asked for that) but does that mean we should forbid the practice?

There follows the new section, and then the diffs. It will appear on the
landfield site presently.

5. Mandatory Headers

   An article MUST have one, and only one, of each of the following
   headers: Date, From, Message-ID, Subject, Newsgroups, Path.

   Note also that there are situations, discussed in the relevant
   parts of section 6, where References, Sender, or Approved headers
   are mandatory. In control articles, specific values are required
   for certain headers.

   For the overall syntax of headers, see section 4.1. In the
   discussions of the individual headers, the content of each is
   specified using the syntax notation. The convention used is that
   the content of, for example, the Subject header is defined as
   <Subject-content>.

   A proto-article (see 7.1.1) may lack some of these mandatory
   headers, but they MUST then be supplied by the injecting agent.

5.1. Date

   The Date header contains the date and time that the article was
   prepared by the poster ready for transmission and SHOULD express
   the poster's local time. The content syntax makes use of syntax
   defined in [MESSFOR].

      Date-content = date-time

        NOTE: It is a useful convention to follow the date-time with a
        comment containing the time zone in human-readable form. The
        use of folding in a date-time is deprecated, even though
        permitted by [MESSFOR].

5.1.1. Examples

      Date: Fri, 2 Apr 1999 20:20:51 -0500 (EST)
      Date: 26 May 1999 16:13 +0000

5.2. From

   The From header contains the electronic address(es), and possibly
   the full name, of the article's author(s). The content syntax makes
   use of syntax defined in [MESSFOR], subject to the following
   revised definition of local-part.

      From-content = mailbox-list
      addr-spec = local-part "@" domain
      local-part = dot-atom / strict-quoted-string

        NOTE: This syntax ensures that the local-part of an addr-spec
        is restricted to pure US-ASCII (and is thus in strict
        compliance with [MESSFOR]), whilst allowing any UTF-8
        character to be used in a preceding quoted-string containing
        the author's full name. If some future extension to the Mail
        protocols should relax this restriction, one would expect the
        Netnews protocols to follow.

   Any mailbox in the From-content field that does not belong to the
   poster(s) of the article MUST end in the top level domain of
   ".invalid" [RFC 2606] unless the owner(s) of those mailboxes have
   authorized the poster(s) of the article to use those mailboxes.

5.2.1. Examples:

      From: John Smith <jsmith@site.example>
      From: "John Smith" <jsmith@site.example>, dave@isp.example
      From: "John D. Smith" <jsmith@site.example>, andrew@isp.example,
         fred@site2.example
      From: Jan Jones <jan@please_setup_your_system_correctly.invalid>
      From: Jan Jones <joe@anonymous.invalid>
      From: dave@isp.example (Dave Smith)

        NOTE: the last example shows a now deprecated convention of
        putting an author's full name in a comment following the
        <mailbox>, rather than in a <phrase> at the start of that
        mailbox. Observe that the quotes around the "John D. Smith"
        example were required, on account of the '.' character, and
        they would also have been required had any UTF8-xtra-char been
        present.

5.3. Message-ID

   The Message-ID header contains the article's message identifier, a
   unique identifier distinguishing the article from every other
   article. The content syntax makes use of syntax defined in
   [MESSFOR], subject to the following revised definition of no-fold-
   quote.

      Message-ID-content = msg-id
      id-left = dot-atom-text / no-fold-quote
      no-fold-quote = DQUOTE *( strict-qtext / strict-quoted-pair )

           NOTE: This syntax ensures that a msg-id is restricted to
           pure US-ASCII (and is thus in strict compliance with
           [MESSFOR]).

   Following the provisions of [MESSFOR], an agent generating an
   article's message identifier MUST ensure that it is unique and that
   it is NEVER reused. Moreover, even though commonly derived from the
   domain name of the originating site (and domain names are case-
   insensitive), a message identifier MUST NOT be altered in any way
   during transport, or when copied (as into a References header), and
   thus a simple (case-sensitive) comparison of octets will always
   suffice to recognise that same message identifier wherever it
   subsequently reappears.

        NOTE: some old software may treat message identifiers that
        differ only in case within their id-right part as equivalent,
        and implementors of agents that generate message identifiers
        should be aware of this.

5.4. Subject

   The Subject header contains a short string identifying the topic of
   the message. This is an inheritable header (see ...) to be copied
   into the Subject header of any followup, in which case the new
   header-content SHOULD then start with the string "Re: " (a "back
   reference") followed by the contents of the pure-subject of the
   precursor. Any leading "Re: " in the pure-subject MUST be stripped.

      Subject-content = [ back-reference ] pure-subject
      nbtext = qtext / "\" / DQUOTE
                                    ; all of <text> except SP and HTAB
      pure-subject = 1*( [FWS] nbtext )
      back-reference = %x52.65.3A.20
                                    ; which is a case-sensitive "Re: "

   The pure-subject MUST NOT begin with "Re: ".

        NOTE: The given syntax differs from that prescribed in
        [MESSFOR] insofar as it does not permit a header content to be
        completely empty, or to consist of WSP only (see remarks in
        ... concerning undesirable headers).

   Followup agents MAY remove instances of non-standard back-reference
   (such as "Re(2): ", "Re:", "RE: ", or "Sv: ") from the Subject-
   content when composing the subject of a followup and add a correct
   back-reference in front of the result.

        NOTE: that would be "SHOULD remove instances" except that we
        cannot find a sufficiently robust and simple algorithm to do
        the necessary natural language processing.

   Followup agents MUST NOT use any other string except "Re: " as a
   back reference. Specifically, a translation of "Re: " into a local
   language or usage MUST NOT be used.

        NOTE: "Re" is an abbreviation for the Latin "In re", meaning
        "in the matter of", and not an abbreviation of "Reference" as
        is sometimes erroneously supposed.

   Agents SHOULD NOT depend on nor enforce the use of back references
   by followup agents. For compatibility with legacy news software the
   Subject-content of a control message MAY start with the string
   "cmsg ", non-control messages MUST NOT start with the string
   "cmsg ".

5.4.1. Examples

   In the following examples, please note that only "Re: " is mandated
   by this standard. "was: " is a convention used by many English-
   speaking posters to signal a change in subject matter. Software
   should be able to deduce this information from References.

      Subject: Film at 11.
      Subject: Re: Film at 11
      Subject: Godwin's law considered harmful (was: Film at 11)
      Subject: Godwin's law (was: Film at 11)
      Subject: Re: Godwin's law (was: Film at 11)

5.5. Newsgroups

   The Newsgroups header's content specifies which newsgroup(s) the
   article is posted to. It is an inheritable header (see ...) which
   SHOULD be copied into the Newsgroups header of any followup, unless
   a Followup-To header is present to prescribe otherwise.

      Newsgroups-content = newsgroup-name
                            *( *FWS ng-delim *FWS newsgroup-name ) *FWS
      newsgroup-name = component *( "." component )
      component = component-start
                            *( component-start / component-other )
      component-start = Un-lowercase / Un-digit
      Un-lowercase = <Unicode Letter, Lowercase> /
                            <Unicode Letter, Other>
      Un-digit = <Unicode Number, Decimal Digit> /
                            <Unicode Number, Other>
      component-other = "+" / "-" / "_"
      ng-delim = ","
   where the <Unicode ...> items are as described in [UNICODE].

   The inclusion of folding white space within a newsgroup-name is a
   newly introduced feature in this standard. It MUST be accepted by
   all conforming implementations (relaying agents, serving agents and
   reading agents). Posting agents should be aware that such postings
   may be rejected by overly-critical old-style relaying agents. When
   a sufficient number of relaying agents are in conformance, posting
   agents SHOULD generate such whitespace in the form of <CRLF WS> so
   as to keep the length of lines in the relevant headers (notably
   Newsgroups and Followup-To) to no more than than 79 characters (or
   other agreed policy limit - see 4.6). Before such critical mass
   occurs, injecting agents MAY reformat such headers by removing
   whitespace inserted by the posting agent, but relaying agents MUST
   NOT do so.
[That is Dan's revision of the Year 2000 text. I have proposed a
revised text in section 3 to match it.]

   A newsgroup-name consists of one or more components. Components MAY
   contain non-ASCII letters, but these MUST be encoded in UTF-8 and
   not according to [RFC 2047]. A component MUST contain at least one
   letter (and MUST, according to the syntax, begin with a letter or
   digit). Components SHOULD begin with a letter. Composite
   characters (made by overlaying one character with another) and
   format characters, as allowed in certain parts of Unicode and
   needed by certain languages, must use whatever canonical
   conventions apply to those parts of Unicode (such conventions are
   not defined in this Standard). The use of "_" in a component is
   deprecated. Serving agents MAY refuse to accept newsgroups using
   that component.

        NOTE: Components composed entirely of digits would cause
        problems for the commonly used implementation technique of
        using the component as the name of a directory, whilst also
        using sequential numbers to distinguish the articles within a
        group. Components containing other non-permitted characters
        could cause problems when newsgroup-names appear in URLs [RFC
        1738] (for example an '@' character would prevent
        distinguishing between newsgroup-names and message
        identifiers).

        NOTE: According to the syntax, uppercase letters cannot occur
        in newsgroup-names, but this standard imposes no requirement
        on software to check this condition, since it would be
        unreasonable to expect it to do so in parts of Unicode for
        which it was not configured (in general, a table lookup is
        required). Rather, it is the responsibility of those creating
        new newsgroups (...) not to violate it, It is, moreover, to be
        expected that a newsgroup created in violation of this
        condition will not be propagated particularly well.
[And insert some more on this subject when we come to the newgroup
control message.]

   Whilst there is no longer any technical reason to limit the length
   of a component (formerly, it was limited to 14 characters) nor to
   limit the total length of a newsgroup-name, it should be noted that
   these names are also used in the newsgroups line (6.6.1.2) where an
   overall limit applies, and moreover excessively long names can be
   exceedingly inconvenient in practical use. Agencies responsible
   for individual hierarchies SHOULD therefore, as a matter of policy,
   set reasonable limits for the length of a component and of a
   newsgroup-name. In the absence of such explicit policies, the
   default figures are 30 characters and 71 characters respectively.
[Observe that I have restored "20" to "30", since that seems to
express the intersection of our agreements on this matter.]
[If the checkpolicies proposal is included in the Standard, there
should be a reference to it here.]

        NOTE: The newsgroup-name as encoded in UTF-8 should be
        regarded as the canonical form. Reading agents may convert it
        to whatever character set they are able to display (see 4.5.2)
        and serving agents may possibly need to convert it to some
        form more suitable as a filename. Simple algorithms for both
        kinds of conversion are readily available. Observe that the
        syntax does not allow comments within the Newsgroups header;
        this is to simplify processing by relaying and serving agents
        which have a requirement to process this header extremely
        rapidly.

   Posters SHOULD use only the names of existing newsgroups in the
   Newsgroups header. However, it is legitimate to cross-post to
   newsgroup(s) which do not exist on the posting agent's host,
   provided that at least one of the newsgroups DOES exist there, and
   followup agents MUST accept this (posting agents MAY accept it, but
   SHOULD at least alert the poster to the situation and request
   confirmation). Relaying agents MUST NOT rewrite Newsgroups headers
   in any way, even if some or all of the newsgroups do not exist on
   the relaying agent's host.

   The Newsgroups header is intended for use in Netnews articles
   rather than in mail messages. It MAY be used in a mail message to
   indicate that it is a copy also posted to the listed newsgroups,
   but it SHOULD NOT be used in a mail-only reply to a Netnews article
   (thus the "inheritable" property of this header applies only to
   followups to a newsgroup, and not to followups to the poster).
   Moreover, if a newsgroup-name contains any non-ASCII character, it
   MAY be encoded using the mechanism defined in [RFC 2047] when sent
   by mail but, if it is subsequently returned to the Netnews
   environment, it MUST then be re-encoded into UTF-8.
[We discussed the conflicting interpretations of the Newsgroup header
in mail. What I have proposed will make 50% of users happy and annoy
the other 50%, but that is better than confusing 100%. I do not expect
my "SHOULD NOT" to be universally observed for some considerable
time.]

5.5.1. Forbidden newsgroup names

   The following forms of newsgroup-name MUST NOT be used except for
   the specific purposes indicated:

     o Newsgroup-names having only one component. These are reserved
       for newsgroups whose propagation is restricted to a single host
       or local network, and for pseudo-newsgroups such as "poster"
       (which has special meaning in the Followup-To header - see
       section 6.1), "junk" (often used by serving agents) and
       "control" (likewise)
     o Any newsgroup-name beginning with "control." (used as pseudo-
       newsgroups by many serving agents)
     o Any newsgroup-name containing the component "ctl" (likewise)
     o "to" or any newsgroup-name beginning with "to." (reserved for
       the ihave/sendme protocol described in section 7.2, and for
       test messages sent on an essentially point-to-point basis)
     o Any newsgroup-name containing the component "all" (because this
       is used as a wildcard in some implementations)

   A newsgroup-name SHOULD NOT appear more than once in the Newsgroups
   header. The order of newsgroup names in the Newsgroups header is
   not significant, except for determining which moderator to send the
   article to if one of the groups is moderated (see 7.1.2).

5.6. Path

   The Path header shows the route taken by a message since its entry
   into the USENET system. It is a variant header (see 4. ...), each
   agent that processes an article being required to add one (or more)
   entries to it. This is primarily to enable relaying agents to avoid
   sending articles to sites already known to have them, in particular
   the site they came from, and additionally to permit tracing the
   route articles take in moving over the network, and for gathering
   USENET statistics. Finally the presence of a "%" delimiter in the
   Path header can be used to identify an article injected in
   conformance with this standard.

5.6.1. Format

      Path-content = *( path-identity [FWS] delimiter [FWS] )
                               tail-entry *FWS
      path-identity = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
      delimiter = "/" / "?" / "%" / "," / "!"
      tail-entry = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )

        NOTE: A Path-content will inevitably contain at least one
        path-identity, except possibly in the case of a proto-article
        that has not yet been injected onto the network.

        NOTE: Observe that the syntax does not allow comments within
        the Path header; this is to simplify processing by relaying
        and injecting agents which have a requirement to process this
        header extremely rapidly.

   A relaying agent SHOULD NOT pass an article to another relaying
   agent whose path-identity (or some known alias thereof) already
   appears in the Path-content. The comparison MAY be either case
   sensitive or case insensitive, and therefore relaying agents MUST
   NOT generate a name which differs from that of another site only in
   terms of case.

   A relaying agent MAY decline to accept an article if its own path-
   identity is already present in the Path-content or if the Path-
   content contains some path-identity whose articles the relaying
   agent does not want, as a matter of local policy.

        NOTE: This last facility is sometimes used to detect and
        decline control messages (notably cancel messages) which have
        been deliberately seeded with a path-identity to be "aliased
        out" by sites not wishing to act upon them.

5.6.2. Adding a path-identity to the Path header

   When an injecting, relaying or serving agent receives an article,
   it MUST prepend its own path-identity followed by a delimiter to
   the beginning of the Path-content. In addition, it SHOULD then add
   CRLF and WSP if it would otherwise result in a line longer than 79
   characters.
[It now seems established that none of the major servers and relayers
has any problem with folding the Path line, and that none of them
barfs on the new delimiters (the worst that can happen being that an
article is offered to a site that already has it).]

   The path-identity added MUST be unique to that agent. To this end
   it SHOULD be one of:

   1. A fully qualified domain name (FQDN) associated (by the Internet
      DNS service [RFC 1034]) with an A record, which SHOULD identify
      the actual machine prepending this path-identity. Ideally, this
      FQDN should also be "mailable" in the sense that it enables the
      construction of a valid E-mail address of the form
      "usenet@<FQDN>" or "news@<FQDN>" [RFC 2142] whereby the
      administrators of that agent may be reached.

   2. A fully qualified domain name (FQDN) associated (by the Internet
      DNS service) with an MX record which MUST enable the
      construction of a valid E-mail address of the form
      "usenet@<FQDN>" or "news@<FQDN>" whereby the administrators of
      that agent may be reached.

   3. A name registered previously in the UUCP maps database (found in
      the newsgroup comp.mail.maps), containing no '.' character.

   4. An encoding of an IP address - <dotted-quad> [RFC 820] or
      <ipv6-numeric> [RFC 2373] (the requirement to be able to use an
      <ipv6-numeric> is the reason for including ':' as an allowed
      character within a path-identity).
[Possibility of [...] around the IP address removed at Russ's request.
Actually the syntax did not permit '[' and ']' in a path-identity, but
I could easily make it do so, and I doubt any harm would ensue.]

   5. A '.' followed by an arbitrary name not in the UUCP maps
      database, but believed to be unique and registered at least with
      all sites immediately downstream from the given site.

   Of the above options, nos. 1 to 3 are much to be preferred, unless
   there are strong technical reasons dictating otherwise. In
   particular, the injecting agent's path-identity MUST, as a special
   case, be an FQDN mailable in the sense defined under option 1, or
   with an associated MX record as in option 2, and it MUST be
   followed by the special delimiter '%' which serves to separate the
   pre-injection and post-injection regions of the Path-content. See
   the Duties of an Injection Agent (section 7.1). In the case of a
   relaying or serving agent, the delimiter is chosen as follows.

   When an agent (other than an injecting agent) receives an article,
   it MUST establish the identity of the source and compare it with
   the leftmost path-identity of the Path-content. If it matches, a
   '/' should be used as the delimiter when prepending the agent's own
   path-identity. If it does not match then the agent should prepend
   two entries to the Path-content; firstly the true established
   path-identity of the source followed by a "?" delimiter, and then,
   to the left of that, the agent's own path-identity followed by a
   '/' delimiter as usual. This prepending of two entries SHOULD NOT
   be done if the provided and established identities match.
[I have upgraded that "MUST" from the previous "SHOULD (and eventually
MUST)". I can see no benefit in not being firm from the start. Of
course, it will not be implemented on day 1. Contrariwise, I have
demoted that "SHOULD NOT" from a "MUST NOT" since nothing actually
breaks if you do it, though it is clearly a lazy behaviour and
potentially doubles the length of the Path line.]

   Any method of establishing the identity of the source may be used
   (but see ... below), with the consideration that, in the event of
   problems, the agent concerned may be called upon to justify it.

        NOTE: The use of the '%' delimiter marks the position of the
        injecting agent in the chain. In normal circumstances there
        should therefore be only one `%` delimiter present, and
        injecting agents MAY choose to reject proto-articles with a
        '%' already in them. If, for whatever reason, more than one
        '%' is found, then the path-identity in front of the leftmost
        '%' is to be regarded as the true injecting agent.

5.6.3. The tail-entry

   For historical reasons, the tail-entry (i.e. the rightmost entry in
   the Path-content) is regarded as a "user name", and therefore MUST
   NOT be interpreted as a site through which the article has already
   passed. Moreover, the Path-content is not an E-mail address and
   MUST NOT be used to contact the poster. Posting and/or injecting
   agents MAY place any string here. When it is not an actual user
   name, the string "not-for-mail" is often used, but in fact a simple
   "x" would be sufficient.

   Often this field will be the only entry in the region (known as the
   pre-injection region) after the '%', although there may be entries
   corresponding to machines traversed between the posting agent and
   the injecting agent proper. In particular, injecting agents that
   receive articles from many sources SHOULD include the identity of
   the source machine connecting to do the injection, and possibly
   other information enabling them to establish the circumstances of
   the injection (provided it does not conflict with any genuine site
   identifier). The '!' delimiter may be used freely within the pre-
   injection region, although '/' and '?' are also appropriate if used
   correctly.
[If/when we invent some form of Trace or NNTP-Posting-Host header, we
may want to revisit that paragraph.]

5.6.4. Delimiter Summary

   A summary of the various delimiters. The name immediately to the
   left of the delimiter is always that of the machine which added the
   delimiter.

   '/' The name immediately to the right is known to be the identity
       of the machine from which the article was received (either
       because the entry was made by that machine and we have verified
       it, or because we have added it ourselves).

   '?' The name immediately to the right is the claimed identity of
       the machine from which the article was received, but we were
       unable to verify it (and have prepended our own view of where
       it came from, and then a '/').

   '%' Everything to the right is the pre-injection region followed by
       the tail-entry. The name on the left is the FQDN of the
       injecting agent. The presence of two '%'s in a path indicates a
       double-injection (see ...).

   '!' The name immediately to the right is unverified. The presence
       of a '!' to the left of the '%' indicates that the identity to
       the left is that of an old-style system not conformant with
       this standard.

   ',' Reserved for future use, treat as '/'.

   Other
       Old software may possibly use other delimiters, which should be
       treated as '!'. But note in particular that ':', '-' and '_'
       are components of names, not delimiters, and FWS on its own
       MUST NOT be used as the sole delimiter.
[I just removed '[' and ']' from that list, but could be persuaded to
put them back so long as the syntax gets fixed at the same time.]

        NOTE: Old Netnews relaying and injecting programs almost all
        delimit Path entries with the "!" delimiter, and these entries
        are not verified. As such, the presence of "%" as a delimiter
        will indicate that the article was injected by software
        conforming to this standard, and the presence of "!" as a
        delimiter to the left of a '%' will indicate that the message
        passed through systems developed prior to this standard. It is
        anticipated that relaying agents will reject articles in the
        old style once this new standard has been widely adopted.

5.6.5. Suggested Verification Methods

   The following approaches for common transports are suggested in
   order to meet a site's verification obligations. They are not
   required, but following them should avoid the necessity for
   wasteful double-entry Path additions.

   If the incoming article arrives through some TCP/IP protocol such
   as NNTP, the IP address of the source will be known, and will
   likely already have been checked against a list of known FQDNs or
   IP addresses that the receiving site has agreed to peer with (this
   will have involved a DNS lookup of a known FQDN, following CNAME
   chains as required, to find an A record containing that source IP).

   1. Where the path-identity is an FQDN (or even an arbitrary name
      starting with a '.') it is now a simple matter to check that it
      is the proper FQDN for the source, or some known registered
      alias thereof. Alternatively, where the FQDN in the path-
      identity has an associated A record, an immediate DNS lookup as
      above can be used to verify it.

   2. Where the path-identity is an encoding of an IP address which
      does not immediately match the known IP address of the source, a
      reverse-DNS (in-addr.arpa PTR record) lookup may be done on the
      provided address, followed by a regular DNS "A" record lookup on
      the returned name. There may be A records for several IP
      addresses, of which one should match the path-identity and
      another should match the source.

   3. If the path-identity fails to match any known alias for the
      source (requiring the insertion of an extra path-identity for
      the true source followed by a '?'), simply doing a reverse DNS
      (PTR) lookup on the source IP address is not sufficient to
      generate the true FQDN. The returned name must be mapped back to
      A records to assure it matches the source's IP address.

   If the incoming article arrives through some other protocol, such
   as UUCP, that protocol MUST include a means of verifying the source
   site. In UUCP implementations, commonly each incoming connection
   has a unique login name and password, and that login name (or some
   alias registered for it) would be expected as the path-identity.
[The above description may still contain more detail that we would
wish. My aim so far was to retain everything in Brad's original, but
expressed in a more palatable manner. We can now decide how much of it
we want to keep.]

5.6.6. Example

      Path: foo.isp.example/
         .foo-server/bar.isp.example?10.123.12.2/old.site.example!
         barbaz/baz.isp.example%dialup123.baz.isp.example!x

        NOTE: That article was injected into the news stream by
        baz.isp.example (complaints may be addressed to
        usenet@baz.isp.example). The injector has taken care to record
        that it got it from dialup123.baz.isp.example. "x" is the
        default tail entry, though sometimes a real userid is put
        there.

        The article was relayed, perhaps by UUCP, to the machine known
        in the UUCP maps database as "barbaz".

        Barbaz relayed it to old.site.example, which does not yet
        conform to this standard (hence the '!' delimiter). So one
        cannot be sure that it really came from barbaz.

        Old.site.example relayed it to a site claiming to have the IP
        address [10.123.12.2], and claiming (by using the '/'
        delimiter) to have verified that it came from
        old.site.example.

        [10.123.12.2] relayed it to ".foo-server" which, not being
        convinced that it truly came from [10.123.12.2], did a reverse
        lookup on the actual source and concluded it was known as
        bar.isp.example (that is not to say that [10.123.12.2] was not
        a correct IP address for bar.isp.example, but simply that that
        connection could not be substantiated by .foo-server).
        Observe that .foo-server has now added two entries to the
        Path.

        ".foo-server" is a locally significant name (observe the
        presence of the '.') within the complex site of many machines
        run by foo.isp.example, so the latter should have no problem
        recognizing .foo-server and using a '/' delimiter. Presumably
        foo.isp.example then delivered the article to its direct
        clients.

        It appears that foo.isp.example and old.site.example decided
        to fold the line, on the grounds that it seemed to be getting
        a little too long.

   [MESSFOR] P. Resnick, "Internet Message Format Standard", draft-
        ietf-drums-msg-fmt-07.txt, March 1998.

   [RFC 1034] P. Mockapetris, "Domain Names - Concepts and
        Facilities", RFC 1034, November 1987.

   [RFC 1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform
        Resource Locators (URL)", RFC 1738, December 1994.

   [RFC 2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions)
        Part Three: Message Header Extensions for Non-ASCII Text", RFC
        2047, November 1996.

   [RFC 2142] D. Crocker, "Mailbox Names for Common Services, Roles
        and Functions", RFC 2142, May 1997.

   [RFC 2373] R. Hinden and S. Deering, "IP Version 6 Addressing
        Architecture", RFC 2373, July 1998.

   [RFC 2606] D. Eastlake and A. Panitz, "Reserved Top Level DNS
        Names", RFC 2606, June 1999.

   [RFC 820] J. Postel and J. Vernon, "Assigned Numbers", RFC 820,
        January 1983.

   [UNICODE] The Unicode Consortium, "The Unicode Standard - Version
        2.0", Addison-Wesley, 1996.

chl% diff -C 2 section_5.02.02 section_5.02.03
*** section_5.02.02 Fri Jul 2 19:50:56 1999
--- section_5.02.03 Thu Aug 12 18:57:05 1999
***************
*** 16,22 ****
     <Subject-content>.
  
! A proto-article (see 7.1.1) may lack some of these mandatory
! headers, but they MUST then be supplied by the injecting
! agent.
  
  5.1. Date
--- 16,21 ----
     <Subject-content>.
  
! A proto-article (see 7.1.1) may lack some of these mandatory
! headers, but they MUST then be supplied by the injecting agent.
  
  5.1. Date
***************
*** 24,29 ****
     The Date header contains the date and time that the article was
     prepared by the poster ready for transmission and SHOULD express
! the poster's local time. The content syntax is as defined in
! [MESSFOR].
  
        Date-content = date-time
--- 23,28 ----
     The Date header contains the date and time that the article was
     prepared by the poster ready for transmission and SHOULD express
! the poster's local time. The content syntax makes use of syntax
! defined in [MESSFOR].
  
        Date-content = date-time
***************
*** 42,48 ****
  
     The From header contains the electronic address(es), and possibly
! the full name, of the article's author(s). The content syntax is as
! defined in [MESSFOR], subject to the following revised definition
! of local-part.
  
        From-content = mailbox-list
--- 41,47 ----
  
     The From header contains the electronic address(es), and possibly
! the full name, of the article's author(s). The content syntax makes
! use of syntax defined in [MESSFOR], subject to the following
! revised definition of local-part.
  
        From-content = mailbox-list
***************
*** 58,65 ****
          Netnews protocols to follow.
  
! All mailboxes in the From-content field MUST either belong to the
! posters(s) of the article (or the poster(s) are authorized by the
! owners to use the mailboxes) or end in the top level domain of
! ".invalid" [RFC 2606].
  
  5.2.1. Examples:
--- 57,64 ----
          Netnews protocols to follow.
  
! Any mailbox in the From-content field that does not belong to the
! poster(s) of the article MUST end in the top level domain of
! ".invalid" [RFC 2606] unless the owner(s) of those mailboxes have
! authorized the poster(s) of the article to use those mailboxes.
  
  5.2.1. Examples:
***************
*** 85,90 ****
     The Message-ID header contains the article's message identifier, a
     unique identifier distinguishing the article from every other
! article. The content syntax is as defined in [MESSFOR], subject to
! the following revised definition of no-fold-quote.
  
        Message-ID-content = msg-id
--- 84,90 ----
     The Message-ID header contains the article's message identifier, a
     unique identifier distinguishing the article from every other
! article. The content syntax makes use of syntax defined in
! [MESSFOR], subject to the following revised definition of no-fold-
! quote.
  
        Message-ID-content = msg-id
***************
*** 93,97 ****
  
             NOTE: This syntax ensures that a msg-id is restricted to
! pure US-ASCII.
  
     Following the provisions of [MESSFOR], an agent generating an
--- 93,98 ----
  
             NOTE: This syntax ensures that a msg-id is restricted to
! pure US-ASCII (and is thus in strict compliance with
! [MESSFOR]).
  
     Following the provisions of [MESSFOR], an agent generating an
***************
*** 105,108 ****
--- 106,114 ----
     subsequently reappears.
  
+ NOTE: some old software may treat message identifiers that
+ differ only in case within their id-right part as equivalent,
+ and implementors of agents that generate message identifiers
+ should be aware of this.
+
  5.4. Subject
  
***************
*** 142,146 ****
  
          NOTE: "Re" is an abbreviation for the Latin "In re", meaning
! "in the matter of", and not an abbreviation of "Reference" is
          is sometimes erroneously supposed.
  
--- 148,152 ----
  
          NOTE: "Re" is an abbreviation for the Latin "In re", meaning
! "in the matter of", and not an abbreviation of "Reference" as
          is sometimes erroneously supposed.
  
***************
*** 179,184 ****
        Un-lowercase = <Unicode Letter, Lowercase> /
                              <Unicode Letter, Other>
- Un-uppercase = <Unicode Letter, Uppercase> /
- <Unicode Letter, Titlecase>
        Un-digit = <Unicode Number, Decimal Digit> /
                              <Unicode Number, Other>
--- 185,188 ----
***************
*** 190,204 ****
     newly introduced feature in this standard. It MUST be accepted by
     all conforming implementations (relaying agents, serving agents and
! reading agents). Posting agents should be aware that, except for
! experimental posting to 'test' newsgroups or within cooperating
! subnets, such postings may be rejected by overly-critical old-style
! relaying agents. When a sufficient number of relaying agents are in
! conformance, posting agents SHOULD generate such whitespace in the
! form of <CRLF WS> so as to keep the length of lines in the relevant
! headers (notably Newsgroups and Followup-To) to no more than than
! 79 characters (or other agreed policy limit - see 4.6). Before such
! critical mass occurs, injecting agents MAY reformat such headers by
! removing whitespace inserted by the posting agent, but relaying
! agents MUST NOT do so.
  [That is Dan's revision of the Year 2000 text. I have proposed a
  revised text in section 3 to match it.]
--- 194,207 ----
     newly introduced feature in this standard. It MUST be accepted by
     all conforming implementations (relaying agents, serving agents and
! reading agents). Posting agents should be aware that such postings
! may be rejected by overly-critical old-style relaying agents. When
! a sufficient number of relaying agents are in conformance, posting
! agents SHOULD generate such whitespace in the form of <CRLF WS> so
! as to keep the length of lines in the relevant headers (notably
! Newsgroups and Followup-To) to no more than than 79 characters (or
! other agreed policy limit - see 4.6). Before such critical mass
! occurs, injecting agents MAY reformat such headers by removing
! whitespace inserted by the posting agent, but relaying agents MUST
! NOT do so.
  [That is Dan's revision of the Year 2000 text. I have proposed a
  revised text in section 3 to match it.]
***************
*** 248,255 ****
     set reasonable limits for the length of a component and of a
     newsgroup-name. In the absence of such explicit policies, the
! default figures are 20 characters and 72 characters respectively.
! [Observe that I have reduced that "20" from "30", on the grounds that
! a particular hierarchy can always decide to up the limit, but no
! hierarchy is ever likely to reduce it.]
  [If the checkpolicies proposal is included in the Standard, there
  should be a reference to it here.]
--- 251,257 ----
     set reasonable limits for the length of a component and of a
     newsgroup-name. In the absence of such explicit policies, the
! default figures are 30 characters and 71 characters respectively.
! [Observe that I have restored "20" to "30", since that seems to
! express the intersection of our agreements on this matter.]
  [If the checkpolicies proposal is included in the Standard, there
  should be a reference to it here.]
***************
*** 267,272 ****
  
     Posters SHOULD use only the names of existing newsgroups in the
! Newsgroups header, because newsgroups are not created simply by
! being posted to. However, it is legitimate to cross-post to
     newsgroup(s) which do not exist on the posting agent's host,
     provided that at least one of the newsgroups DOES exist there, and
--- 269,273 ----
  
     Posters SHOULD use only the names of existing newsgroups in the
! Newsgroups header. However, it is legitimate to cross-post to
     newsgroup(s) which do not exist on the posting agent's host,
     provided that at least one of the newsgroups DOES exist there, and
***************
*** 295,299 ****
  5.5.1. Forbidden newsgroup names
  
! The following newsgroup-names MUST NOT be used:
  
       o Newsgroup-names having only one component. These are reserved
--- 296,301 ----
  5.5.1. Forbidden newsgroup names
  
! The following forms of newsgroup-name MUST NOT be used except for
! the specific purposes indicated:
  
       o Newsgroup-names having only one component. These are reserved
***************
*** 307,312 ****
       o Any newsgroup-name containing the component "ctl" (likewise)
       o "to" or any newsgroup-name beginning with "to." (reserved for
! test messages sent on an essentially point-to-point basis (see
! also the ihave/sendme protocol described in section 7.2)
       o Any newsgroup-name containing the component "all" (because this
         is used as a wildcard in some implementations)
--- 309,314 ----
       o Any newsgroup-name containing the component "ctl" (likewise)
       o "to" or any newsgroup-name beginning with "to." (reserved for
! the ihave/sendme protocol described in section 7.2, and for
! test messages sent on an essentially point-to-point basis)
       o Any newsgroup-name containing the component "all" (because this
         is used as a wildcard in some implementations)
***************
*** 333,341 ****
  
        Path-content = *( path-identity [FWS] delimiter [FWS] )
! tail-entry
        path-identity = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
! delimiter = "/" / "@" / "%" / "," / "!"
        tail-entry = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
  
          NOTE: Observe that the syntax does not allow comments within
          the Path header; this is to simplify processing by relaying
--- 335,347 ----
  
        Path-content = *( path-identity [FWS] delimiter [FWS] )
! tail-entry *FWS
        path-identity = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
! delimiter = "/" / "?" / "%" / "," / "!"
        tail-entry = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
  
+ NOTE: A Path-content will inevitably contain at least one
+ path-identity, except possibly in the case of a proto-article
+ that has not yet been injected onto the network.
+
          NOTE: Observe that the syntax does not allow comments within
          the Path header; this is to simplify processing by relaying
***************
*** 345,353 ****
     A relaying agent SHOULD NOT pass an article to another relaying
     agent whose path-identity (or some known alias thereof) already
! appears in the Path-content. Observe that, for purposes of
! comparison, path-identities are case-sensitive. A relaying agent
! MAY decline to accept an article if its own path-identity (or some
! alias thereof) is already present in the Path-content.
  
          NOTE: This last facility is sometimes used to detect and
          decline control messages (notably cancel messages) which have
--- 351,364 ----
     A relaying agent SHOULD NOT pass an article to another relaying
     agent whose path-identity (or some known alias thereof) already
! appears in the Path-content. The comparison MAY be either case
! sensitive or case insensitive, and therefore relaying agents MUST
! NOT generate a name which differs from that of another site only in
! terms of case.
  
+ A relaying agent MAY decline to accept an article if its own path-
+ identity is already present in the Path-content or if the Path-
+ content contains some path-identity whose articles the relaying
+ agent does not want, as a matter of local policy.
+
          NOTE: This last facility is sometimes used to detect and
          decline control messages (notably cancel messages) which have
***************
*** 370,401 ****
     it SHOULD be one of:
  
! 1. A fully qualified domain name (FQDN) which MUST be associated
! with an A record or an MX record (or both), retrievable via the
! Internet DNS service [RFC 1034]. Any such A record SHOULD be
! that of the machine generating this path-identity, and any such
! MX record MUST enable the construction of a valid E-mail address
! of the form "usenet@<FQDN>" or "news@<FQDN>" [RFC 2142]. The
! FQDN SHOULD be in all-lowercase form so as to facilitate rapid
! (case senstitive) comparisons.
  
! 2. A name registered previously in the UUCP maps database (found in
! the newsgroup comp.mail.maps), containing no '.' character.
  
! 3. An encoding of an IP address - <dotted-quad> [RFC 820] or
! <ipv6-numeric> [RFC 1884]- preferably enclosed between '[' and '
! ]' (the requirement to be able to use an <ipv6-numeric> is the
! reason for including ':' as an allowed character within a path-
! identity).
! [Is there some reason why the [...] was obligatory around an <ipv6-
! numeric> in Brad's syntax, but not around a <dotted-quad>?]
  
! 4. A '.' followed by an arbitrary name not in the UUCP maps
        database, but believed to be unique and registered at least with
        all sites immediately downstream from the given site.
  
! Of the above options, nos. 1 and 2 are much to be preferred, unless
     there are strong technical reasons dictating otherwise. In
     particular, the injecting agent's path-identity MUST, as a special
! case, be an FQDN with an associated MX record and it MUST be
     followed by the special delimiter '%' which serves to separate the
     pre-injection and post-injection regions of the Path-content. See
--- 381,418 ----
     it SHOULD be one of:
  
! 1. A fully qualified domain name (FQDN) associated (by the Internet
! DNS service [RFC 1034]) with an A record, which SHOULD identify
! the actual machine prepending this path-identity. Ideally, this
! FQDN should also be "mailable" in the sense that it enables the
! construction of a valid E-mail address of the form
! "usenet@<FQDN>" or "news@<FQDN>" [RFC 2142] whereby the
! administrators of that agent may be reached.
  
! 2. A fully qualified domain name (FQDN) associated (by the Internet
! DNS service) with an MX record which MUST enable the
! construction of a valid E-mail address of the form
! "usenet@<FQDN>" or "news@<FQDN>" whereby the administrators of
! that agent may be reached.
  
! 3. A name registered previously in the UUCP maps database (found in
! the newsgroup comp.mail.maps), containing no '.' character.
  
! 4. An encoding of an IP address - <dotted-quad> [RFC 820] or
! <ipv6-numeric> [RFC 2373] (the requirement to be able to use an
! <ipv6-numeric> is the reason for including ':' as an allowed
! character within a path-identity).
! [Possibility of [...] around the IP address removed at Russ's request.
! Actually the syntax did not permit '[' and ']' in a path-identity, but
! I could easily make it do so, and I doubt any harm would ensue.]
!
! 5. A '.' followed by an arbitrary name not in the UUCP maps
        database, but believed to be unique and registered at least with
        all sites immediately downstream from the given site.
  
! Of the above options, nos. 1 to 3 are much to be preferred, unless
     there are strong technical reasons dictating otherwise. In
     particular, the injecting agent's path-identity MUST, as a special
! case, be an FQDN mailable in the sense defined under option 1, or
! with an associated MX record as in option 2, and it MUST be
     followed by the special delimiter '%' which serves to separate the
     pre-injection and post-injection regions of the Path-content. See
***************
*** 405,419 ****
     When an agent (other than an injecting agent) receives an article,
     it MUST establish the identity of the source and compare it with
! the leftmost path-identity of the Path-content. If it matches, a '
! /' should be used as the delimiter when prepending the agent's own
     path-identity. If it does not match then the agent should prepend
     two entries to the Path-content; firstly the true established
! path-identity of the source followed by an "@" delimiter, and then,
! to the left of that, the agent's own path-identity followed by a '
! /' delimiter as usual. This prepending of two entries MUST NOT be
! done if the provided and established identities match.
  [I have upgraded that "MUST" from the previous "SHOULD (and eventually
  MUST)". I can see no benefit in not being firm from the start. Of
! course, it will not be implemented on day 1.]
  
     Any method of establishing the identity of the source may be used
--- 422,439 ----
     When an agent (other than an injecting agent) receives an article,
     it MUST establish the identity of the source and compare it with
! the leftmost path-identity of the Path-content. If it matches, a
! '/' should be used as the delimiter when prepending the agent's own
     path-identity. If it does not match then the agent should prepend
     two entries to the Path-content; firstly the true established
! path-identity of the source followed by a "?" delimiter, and then,
! to the left of that, the agent's own path-identity followed by a
! '/' delimiter as usual. This prepending of two entries SHOULD NOT
! be done if the provided and established identities match.
  [I have upgraded that "MUST" from the previous "SHOULD (and eventually
  MUST)". I can see no benefit in not being firm from the start. Of
! course, it will not be implemented on day 1. Contrariwise, I have
! demoted that "SHOULD NOT" from a "MUST NOT" since nothing actually
! breaks if you do it, though it is clearly a lazy behaviour and
! potentially doubles the length of the Path line.]
  
     Any method of establishing the identity of the source may be used
***************
*** 422,431 ****
  
          NOTE: The use of the '%' delimiter marks the position of the
! injecting agent in the chain. In a well-ordered net, there
          should therefore be only one `%` delimiter present, and
! injecting agents MAY choose to reject proto-articles with a '
! %' already in them. If, for whatever reason, more than one '%'
! is found, then the path-identity in front of the leftmost '%'
! is to be regarded as the true injecting agent.
  
  5.6.3. The tail-entry
--- 442,451 ----
  
          NOTE: The use of the '%' delimiter marks the position of the
! injecting agent in the chain. In normal circumstances there
          should therefore be only one `%` delimiter present, and
! injecting agents MAY choose to reject proto-articles with a
! '%' already in them. If, for whatever reason, more than one
! '%' is found, then the path-identity in front of the leftmost
! '%' is to be regarded as the true injecting agent.
  
  5.6.3. The tail-entry
***************
*** 447,454 ****
     the source machine connecting to do the injection, and possibly
     other information enabling them to establish the circumstances of
! the injection, provided they do so in a manner that does not match
! any site identifier. The '!' delimiter may be used freely within
! the pre-injection region, although '/' and '@' are also appropriate
! if used correctly.
  [If/when we invent some form of Trace or NNTP-Posting-Host header, we
  may want to revisit that paragraph.]
--- 467,474 ----
     the source machine connecting to do the injection, and possibly
     other information enabling them to establish the circumstances of
! the injection (provided it does not conflict with any genuine site
! identifier). The '!' delimiter may be used freely within the pre-
! injection region, although '/' and '?' are also appropriate if used
! correctly.
  [If/when we invent some form of Trace or NNTP-Posting-Host header, we
  may want to revisit that paragraph.]
***************
*** 465,478 ****
         it, or because we have added it ourselves).
  
! '@' The name immediately to the right is the claimed identity of
         the machine from which the article was received, but we were
         unable to verify it (and have prepended our own view of where
         it came from, and then a '/').
- [Do we want to change '@' to '?'; is there any danger in doing so?]
  
     '%' Everything to the right is the pre-injection region followed by
         the tail-entry. The name on the left is the FQDN of the
! injecting agent. The presence of two '%'s in a path indicates
! a double-injection error.
  
     '!' The name immediately to the right is unverified. The presence
--- 485,497 ----
         it, or because we have added it ourselves).
  
! '?' The name immediately to the right is the claimed identity of
         the machine from which the article was received, but we were
         unable to verify it (and have prepended our own view of where
         it came from, and then a '/').
  
     '%' Everything to the right is the pre-injection region followed by
         the tail-entry. The name on the left is the FQDN of the
! injecting agent. The presence of two '%'s in a path indicates a
! double-injection (see ...).
  
     '!' The name immediately to the right is unverified. The presence
***************
*** 485,491 ****
     Other
         Old software may possibly use other delimiters, which should be
! treated as '!'. But note in particular that ':', '-', '_', '['
! and ']' are components of names, not delimiters, and FWS on its
! own MUST NOT be used as the sole delimiter.
  
          NOTE: Old Netnews relaying and injecting programs almost all
--- 504,512 ----
     Other
         Old software may possibly use other delimiters, which should be
! treated as '!'. But note in particular that ':', '-' and '_'
! are components of names, not delimiters, and FWS on its own
! MUST NOT be used as the sole delimiter.
! [I just removed '[' and ']' from that list, but could be persuaded to
! put them back so long as the syntax gets fixed at the same time.]
  
          NOTE: Old Netnews relaying and injecting programs almost all
***************
*** 499,502 ****
--- 520,525 ----
          old style once this new standard has been widely adopted.
  
+
+
  5.6.5. Suggested Verification Methods
  
***************
*** 530,534 ****
     3. If the path-identity fails to match any known alias for the
        source (requiring the insertion of an extra path-identity for
! the true source followed by an '@'), simply doing a reverse DNS
        (PTR) lookup on the source IP address is not sufficient to
        generate the true FQDN. The returned name must be mapped back to
--- 553,557 ----
     3. If the path-identity fails to match any known alias for the
        source (requiring the insertion of an extra path-identity for
! the true source followed by a '?'), simply doing a reverse DNS
        (PTR) lookup on the source IP address is not sufficient to
        generate the true FQDN. The returned name must be mapped back to
***************
*** 549,553 ****
  
        Path: foo.isp.example/
! .foo-server/bar.isp.example@[10.123.12.2]/old.site.example!
           barbaz/baz.isp.example%dialup123.baz.isp.example!x
  
--- 572,576 ----
  
        Path: foo.isp.example/
! .foo-server/bar.isp.example?10.123.12.2/old.site.example!
           barbaz/baz.isp.example%dialup123.baz.isp.example!x
  
***************
*** 601,607 ****
          Resource Locators (URL)", RFC 1738, December 1994.
  
- [RFC 1884] Robert M. Hinden and Stephen E. Deering, "IP version 6
- addressing architecture", RFC 1884, December 1995.
-
     [RFC 2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions)
          Part Three: Message Header Extensions for Non-ASCII Text", RFC
--- 624,627 ----
***************
*** 611,614 ****
--- 631,637 ----
          and Functions", RFC 2142, May 1997.
  
+ [RFC 2373] R. Hinden and S. Deering, "IP Version 6 Addressing
+ Architecture", RFC 2373, July 1998.
+
     [RFC 2606] D. Eastlake and A. Panitz, "Reserved Top Level DNS
          Names", RFC 2606, June 1999.

Charles H. Lindsey ---------At Home, doing my own thing------------------------
Email: chl@clw.cs.man.ac.uk Web: http://www.cs.man.ac.uk/~chl
Voice/Fax: +44 161 437 4506 Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.