[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

#1416 Injection-Date: proposed diff



It's been a while since we discussed this, so here again is my current
proposed diff.  If there are parts of this we can agree on independently,
I can start merging it into the main draft.

--- usepro.xml	2007-07-01 19:38:06.000000000 -0700
+++ usepro-1416.xml	2007-07-16 13:43:23.000000000 -0700
@@ -18,6 +18,8 @@
     'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2822.xml'>
   <!ENTITY rfc3629 PUBLIC '' 
     'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml'>
+  <!ENTITY rfc3798 PUBLIC '' 
+    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3798.xml'>
   <!ENTITY rfc3977 PUBLIC '' 
     'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3977.xml'>
   <!ENTITY rfc4234 PUBLIC '' 
@@ -165,8 +167,9 @@
 
         <t>"Injecting" an article is the processing of a proto-article by
         an injecting agent.  Normally this action is done once and only
-        once for a given article.  "Reinjecting" an article is passing an
-        already-injected article to an injection agent.</t>
+        once for a given article.  "Multiple injection" is passing the
+        same article to multiple injecting agents, either serially or in
+        parallel.</t>
 
         <t>A "gateway" is software which receives news articles and
         converts them to messages of some other kind (such as <xref
@@ -452,6 +455,66 @@
         </section>
       </section>
 
+      <section anchor="history"
+               title="Article History and Duplicate Suppression">
+        <t>Netnews normally uses a flood-fill algorithm for propagation of
+        articles in which each news server offers articles it accepts to
+        multiple peers and each news server may be offered the same
+        article from multiple other news servers.  Accordingly, duplicate
+        suppression is key; if a news server accepted every article it was
+        offered, it may needlessly accept (and then potentially
+        retransmit) dozens of copies of every article.</t>
+
+        <t>Relaying and serving agents therefore MUST keep a record of
+        articles they have already seen and use that record to reject
+        additional offers of the same article.  This record is called the
+        "history" file or database.</t>
+
+        <t>Each article is uniquely identified by its message identifier,
+        so a relaying or serving agent could satisfy this requirement by
+        storing a record of every message identifier that agent has ever
+        seen.  Such a history database would grow without bound, however,
+        so it is common and permitted to optimize based on the
+        Injection-Date or Date header field of an article as follows.  (In
+        the following discussion, the "date" of an article is defined to
+        be the date represented by its Injection-Date header field if
+        present, otherwise its Date header field.)
+          <list style="symbols">
+            <t>Agents MAY select a cutoff interval and reject any article
+            with a date farther in the past than that cutoff interval.  If
+            this interval is shorter than the time it takes for an article
+            to propagate through the network, the agent may reject an
+            article it had not yet seen, so it ought not be aggressively
+            short.  For Usenet, for example, a cutoff interval of no less
+            than seven days is conventional.</t>
+
+            <t>Agents that enforce such a cutoff MAY then drop records of
+            articles that had dates older than the cutoff from their
+            history databases.  If such an article were offered to the
+            agent again, it would be rejected due to the cutoff date, so
+            the history record is no longer required to suppress the
+            duplicate.</t>
+
+            <t>Alternatively, agents MAY drop history records according to
+            the date when the article was first seen by that agent rather
+            than the date of the article.  In this case, the history
+            retention interval MUST be at least 24 hours longer than the
+            cutoff interval to allow for articles dated in the future.
+            This interval matches the allowable error in the date of the
+            article (see <xref target="injecting" />).</t>
+          </list>
+        </t>
+
+        <t>These are just two implementation strategies for article
+        history, albeit the most common ones.  Relaying and serving agents
+        are not required to use these strategies, only to meet the
+        requirement of not accepting an article more than once.  However,
+        these strategies are safe and widely deployed and implementors are
+        encouraged to use one of them, especially if they not have
+        extensive experience with Netnews and the subtle effects of its
+        flood-fill algorithm.</t>
+      </section>
+
       <section anchor="posting" title="Duties of a Posting Agent">
         <t>A posting agent is the component of a user agent that assists a
         poster in creating a valid proto-article and forwarding it to an
@@ -459,9 +522,33 @@
 
         <t>Posting agents SHOULD ensure that proto-articles they create
         are valid according to <xref target="USEFOR" /> and any other
-        applicable policies.  They MUST NOT create any Injection-Date or
-        Injection-Info header fields; these headers will be added by the
-        injecting agent.</t>
+        applicable policies.  They MUST NOT create any Injection-Info
+        header field; this header field will be added by the injecting
+        agent.</t>
+
+        <t>If the proto-article already contains both Message-ID and Date
+        header fields, posting agents MAY add an Injection-Date header
+        field to that proto-article immediately before passing that
+        proto-article to an injection agent.  They SHOULD do so if the
+        Date header field (representing the composition time of the
+        proto-article) is more than a day in the past at the time of
+        injection.  If the proto-article is being submitted to more than
+        one injecting agent, see <xref target="multi-injection" />.</t>
+
+        <t>The Injection-Date header field is new in this revision of the
+        Netnews protocol and is designed to allow the Date header field to
+        hold the composition date (as recommended in section 3.6.1 of
+        <xref target="RFC2822" />), even if the proto-article is not
+        injected for some time after its composition.  However, note that
+        all implementations predating this specification ignore the
+        Injection-Date header field and use the Date header field in its
+        stead for rejecting articles older than their cutoff (see <xref
+        target="history" />), and injecting agents predating this
+        specification do not add an Injection-Date header.  Articles with
+        a Date header field substantially in the past will still be
+        rejected by implementations predating this specification,
+        regardless of the Injection-Date header field, and hence may
+        suffer poorer propagation.</t>
 
         <t>Contrary to <xref target="RFC2822" />, which implies that the
         mailbox or mailboxes in the From header field should be that of
@@ -484,48 +571,73 @@
           agent.</t>
 
           <t>A proto-article has the same format as a normal article
-          except that the Injection-Date, Injection-Info, and Xref header
-          fields MUST NOT be present; the Path header field MUST NOT
-          contain a "POSTED" &lt;diag-keyword>; and any of the following
-          mandatory header fields MAY be omitted: Message-ID, Date, and
-          Path.  In all other respects, a proto-article MUST be a valid
-          Netnews article.  In particular, the header fields which may be
-          omitted MUST NOT be present with invalid content.</t>
+          except that the Injection-Info and Xref header fields MUST NOT
+          be present; the Path header field MUST NOT contain a "POSTED"
+          &lt;diag-keyword>; and any of the following mandatory header
+          fields MAY be omitted: Message-ID, Date, and Path.  In all other
+          respects, a proto-article MUST be a valid Netnews article.  In
+          particular, the header fields which may be omitted MUST NOT be
+          present with invalid content.</t>
 
           <t>If a posting agent intends to offer the same proto-article to
-          multiple injecting agents, the header fields Message-ID and Date
-          MUST be present and identical in all copies of the
-          proto-article.</t>
+          multiple injecting agents, the header fields Message-ID, Date,
+          and Injection-Date MUST be present and identical in all copies
+          of the proto-article.  See <xref target="multi-injection" />.</t>
         </section>
 
-        <section anchor="reinjection" title="Reinjection of Articles">
-          <t>A given article SHOULD be processed by an injecting agent
-          once and only once.  The Injection-Date or Injection-Info
-          header fields are added by an injecting agent and are not
-          permitted in a proto-article.  Their presence (or the presence
-          of other unstandardized or obsolete trace headers such as
-          NNTP-Posting-Host, NNTP-Posting-Date, or X-Trace) indicates
-          that the proto-article is instead an article and has already
-          been processed by an injecting agent.  A posting agent SHOULD
-          normally reject such articles.</t>
-
-          <t>In the exceptional case that an article needs to be
-          reinjected for some reason (such as transferring an article from
-          one Netnews to another where those networks have no relaying
-          agreement), the posting agent doing the reinjection MUST convert
-          the article back into a proto-article before passing it to an
-          injecting agent (such as by renaming the Injection-Info and
-          Injection-Date header fields and removing any Xref header field)
-          and MUST perform the date checks on the existing Injection-Date
-          or Date header fields that would otherwise be done by the
-          injecting agent.</t>
-
-          <t>Reinjecting articles may cause loops, loss of trace
-          information, and other problems and should only be done with
-          care and when there is no available alternative.  A posting
-          agent that does reinjection is a limited type of gateway and as
-          such is subject to all of the requirements of an incoming
-          gateway in addition to the requirements of a posting agent.</t>
+        <section anchor="multi-injection"
+                 title="Multiple Injection of Articles">
+          <t>Under some circumstances (posting to multiple disjoint
+          networks, injecting agents with spotty connectivity, or for
+          redundancy, for example), a posting agent may wish to offer the
+          same article to multiple injecting agents.  In this unusual
+          case, the goal is to not create multiple independent articles
+          but rather to inject the same article at multiple points and let
+          the normal duplicate suppression facility of Netnews (see <xref
+          target="history" />) ensure that any given agent accepts the
+          article only once.</t>
+
+          <t>Whenever possible, multiple injection SHOULD be done by
+          offering the same proto-article to multiple injecting agents.
+          The posting agent MUST supply the Message-ID, Date, and
+          Injection-Date header fields, and the proto-article as offered
+          to each injecting agent MUST be identical.</t>
+
+          <t>In some cases, offering the same proto-article to all
+          injecting agents may not be possible (such as when gatewaying,
+          after the fact, articles found on one Netnews network to
+          another, supposedly unconnected one).  In this case, the posting
+          agent MUST convert the article back into a proto-article before
+          passing it to another injecting agent, but it MUST retain
+          unmodified the Message-ID, Date, and Injection-Date header
+          fields.  It MUST NOT add an Injection-Date header field if it is
+          missing from the existing article.  It MUST remove any Xref
+          header field and either rename or remove any Injection-Info
+          header field and other trace fields.
+            <list style="empty">
+              <t>NOTE: Multiple injection inherently risks duplicating
+              articles.  Multiple injection after the fact, by converting
+              an article back to a proto-article and injecting it again,
+              additionally risks loops, loss of trace information,
+              unintended repeat injection into the same network, and other
+              problems.  It should be done with care and only when there
+              is no alternative.  The requirement to retain Message-ID,
+              Date, and Injection-Date header fields minimizes the
+              possibility of a loop and ensures that the newly injected
+              article is not treated as a new, separate article.</t>
+            </list>
+          </t>
+
+          <t>Multiple injection of an article listing one or more
+          moderated newsgroups in its Newsgroups header field SHOULD only
+          be done by a moderator and MUST only be done after the
+          proto-article is approved for all moderated groups to which it
+          is to be posted and has an Approved header field (see <xref
+          target="moderator" />).  Multiple injection of an unapproved
+          article intended for moderated newsgroups will normally only
+          result in the moderator receiving multiple copies, and if the
+          newsgroup status is not consistent across all injecting agents,
+          may result in duplication of the article or other problems.</t>
         </section>
 
         <section anchor="followups" title="Followups">
@@ -650,23 +762,27 @@
 
             <t>It MUST reject any proto-article that does not have the
             proper mandatory header fields for a proto-article; that has
-            Injection-Date, Injection-Info, or Xref header fields; that
-            has a Path header field containing the "POSTED"
-            &lt;diag-keyword>; or that is not syntactically valid as
-            defined by <xref target="USEFOR" />.  It SHOULD reject any
-            proto-article which contains a header field deprecated for
-            Netnews.  It MAY reject any proto-article that contains trace
-            header fields indicating that it was already injected by an
-            injecting agent that did not add Injection-Info or
-            Injection-Date.</t>
-
-            <t>It SHOULD reject any article whose Date header field is
-            more than 24 hours into the future (and MAY use a margin less
-            than 24 hours).  It SHOULD reject any article whose Date
-            header appears to be stale (more than 72 hours into the past,
-            for example, or too old to still be recorded in the database
-            of a relaying agent the injecting agent will be using) since
-            not all news servers support Injection-Date.</t>
+            Injection-Info or Xref header fields; that has a Path header
+            field containing the "POSTED" &lt;diag-keyword>; or that is
+            not syntactically valid as defined by <xref target="USEFOR"
+            />.  It SHOULD reject any proto-article which contains a
+            header field deprecated for Netnews (see, for example, <xref
+            target="RFC3798" />).  It MAY reject any proto-article that
+            contains trace header fields (e.g., NNTP-Posting-Host)
+            indicating that it was already injected by an injecting agent
+            that did not add Injection-Info or Injection-Date.</t>
+
+            <t>It SHOULD reject any article whose Injection-Date or Date
+            header field is more than 24 hours into the future (and MAY
+            use a margin less than 24 hours).  It SHOULD reject any
+            article whose Injection-Date header field is too far in the
+            past (older than the cutoff interval of a relaying agent the
+            injecting agent is using, for example).  It SHOULD similarly
+            reject any article whose Date header field is too far in the
+            past, since not all news servers support Injection-Date and
+            only the injecting agent can provide a useful error message to
+            the posting agent.  In either case, this interval SHOULD NOT
+            be any shorter than 72 hours into the past.</t>
 
             <t>It SHOULD reject any proto-article whose Newsgroups header
             field does not contain at least one &lt;newsgroup-name> for a
@@ -710,8 +826,14 @@
             the source of the article and possibly other trace information
             as described in Section 3.2.8 of <xref target="USEFOR" />.</t>
 
-            <t>The injecting agent MUST then add an Injection-Date header
-            field containing the current date and time.</t>
+            <t>If the proto-article already had an Injection-Date header
+            field, it MUST NOT be modified or replaced.  If the
+            proto-article had both a Message-ID header field and a Date
+            header field, an Injection-Date header field MUST NOT be
+            added, since the proto-article may have been multiply injected
+            by a posting agent that predates this standard.  Otherwise,
+            the injecting agent MUST add an Injection-Date header field
+            containing the current date and time.</t>
 
             <t>Finally, the injecting agent forwards the article to one or
             more relaying agents, and the injection process is
@@ -806,18 +928,18 @@
             field or Message-ID header field, or without either an
             Injection-Date or Date header field.</t>
 
-            <t>It MUST reject any article that has already been
-            successfully sent to it, based on the Message-ID header field
-            of the article.  To satisfy this requirement, a relaying agent
-            normally keeps a database of message identifiers it has
-            already accepted.</t>
-
             <t>It MUST examine the Injection-Date header field or, if
             absent, the Date header field, and reject the article if that
-            date predates the earliest articles of which it keeps record
-            or if that date is more than 24 hours into the future.  It MAY
-            reject articles with dates in the future with a smaller margin
-            than 24 hours.</t>
+            date is more than 24 hours into the future.  It MAY reject
+            articles with dates in the future with a smaller margin than
+            24 hours.</t>
+
+            <t>It MUST reject any article that has already been accepted.
+            If it implements the mechanism described in <xref
+            target="history" />, this means that it MUST reject any
+            article whose date falls outside the cutoff interval since it
+            won't know whether such articles had been accepted previously
+            or not.</t>
 
             <t>It SHOULD reject any article that does not include all the
             mandatory header fields.  It MAY reject any article that
@@ -891,16 +1013,16 @@
 
             <t>It MUST examine the Injection-Date header field or, if
             absent, the Date header field, and reject the article if that
-            date predates the earliest articles of which it keeps record
-            or if that date is more than 24 hours into the future.  It MAY
-            reject articles with dates in the future with a smaller margin
-            than 24 hours.</t>
-
-            <t>It MUST reject any article that has already been
-            successfully sent to it, based on the Message-ID header field
-            of the article.  To satisfy this requirement, a relaying agent
-            normally keeps a database of message identifiers it has
-            already accepted.</t>
+            date is more than 24 hours into the future.  It MAY reject
+            articles with dates in the future with a smaller margin than
+            24 hours.</t>
+
+            <t>It MUST reject any article that has already been accepted.
+            If it implements the mechanism described in <xref
+            target="history" />, this means that it MUST reject any
+            article whose date falls outside the cutoff interval since it
+            won't know whether such articles had been accepted previously
+            or not.</t>
 
             <t>It SHOULD reject any article that matches an
             already-received and honored cancel message or Supersedes
@@ -1008,8 +1130,7 @@
             for reasons understood by the moderator (such as delays in the
             moderation process) in which case they MAY substitute the
             current date.  Any Injection-Date, Injection-Info, or Xref
-            header fields already present (though there should be none)
-            MUST be removed.</t>
+            header fields already present MUST be removed.</t>
 
             <t>Any Path header field MUST either be removed or truncated
             to only those entries following its "POSTED"
@@ -2042,6 +2163,7 @@
       &rfc1036;
       &rfc2045;
       &rfc2606;
+      &rfc3798;
       &rfc3977;
       <reference anchor="USEAGE">
         <front>

-- 
Russ Allbery (rra@xxxxxxxxxxxx)             <http://www.eyrie.org/~eagle/>