[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

History section (Re: #1416 Injection-Date: proposed diff)




I'll take this by section, since I'm not in a position to comment on the whole block in one go...

Speaking as a contributor:
I like the text of the history section, and think that we should just declare it "text accepted".

Russ Allbery wrote:
It's been a while since we discussed this, so here again is my current
proposed diff.  If there are parts of this we can agree on independently,
I can start merging it into the main draft.

--- usepro.xml	2007-07-01 19:38:06.000000000 -0700
+++ usepro-1416.xml	2007-07-16 13:43:23.000000000 -0700
@@ -18,6 +18,8 @@
     'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2822.xml'>
<!ENTITY rfc3629 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml'> + <!ENTITY rfc3798 PUBLIC '' + 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3798.xml'> <!ENTITY rfc3977 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3977.xml'> <!ENTITY rfc4234 PUBLIC '' @@ -165,8 +167,9 @@ <t>"Injecting" an article is the processing of a proto-article by
         an injecting agent.  Normally this action is done once and only
-        once for a given article.  "Reinjecting" an article is passing an
-        already-injected article to an injection agent.</t>
+        once for a given article.  "Multiple injection" is passing the
+        same article to multiple injecting agents, either serially or in
+        parallel.</t>
Nit: this doesn't seem to take a position on whether it's the same posting agent doing the multiple injection or different posting agents; I suggest adding ", by one or several posting agents" if this is what we think is intended.
<t>A "gateway" is software which receives news articles and
         converts them to messages of some other kind (such as <xref
@@ -452,6 +455,66 @@
         </section>
       </section>
+ <section anchor="history"
+               title="Article History and Duplicate Suppression">
+        <t>Netnews normally uses a flood-fill algorithm for propagation of
+        articles in which each news server offers articles it accepts to
+        multiple peers and each news server may be offered the same
+        article from multiple other news servers.  Accordingly, duplicate
+        suppression is key; if a news server accepted every article it was
+        offered, it may needlessly accept (and then potentially
+        retransmit) dozens of copies of every article.</t>
+
+        <t>Relaying and serving agents therefore MUST keep a record of
+        articles they have already seen and use that record to reject
+        additional offers of the same article.  This record is called the
+        "history" file or database.</t>
+
+        <t>Each article is uniquely identified by its message identifier,
+        so a relaying or serving agent could satisfy this requirement by
+        storing a record of every message identifier that agent has ever
+        seen.  Such a history database would grow without bound, however,
+        so it is common and permitted to optimize based on the
+        Injection-Date or Date header field of an article as follows.  (In
+        the following discussion, the "date" of an article is defined to
+        be the date represented by its Injection-Date header field if
+        present, otherwise its Date header field.)
+          <list style="symbols">
+            <t>Agents MAY select a cutoff interval and reject any article
+            with a date farther in the past than that cutoff interval.  If
+            this interval is shorter than the time it takes for an article
+            to propagate through the network, the agent may reject an
+            article it had not yet seen, so it ought not be aggressively
+            short.  For Usenet, for example, a cutoff interval of no less
+            than seven days is conventional.</t>
+
+            <t>Agents that enforce such a cutoff MAY then drop records of
+            articles that had dates older than the cutoff from their
+            history databases.  If such an article were offered to the
+            agent again, it would be rejected due to the cutoff date, so
+            the history record is no longer required to suppress the
+            duplicate.</t>
+
+            <t>Alternatively, agents MAY drop history records according to
+            the date when the article was first seen by that agent rather
+            than the date of the article.  In this case, the history
+            retention interval MUST be at least 24 hours longer than the
+            cutoff interval to allow for articles dated in the future.
+            This interval matches the allowable error in the date of the
+            article (see <xref target="injecting" />).</t>
+          </list>
+        </t>
+
+        <t>These are just two implementation strategies for article
+        history, albeit the most common ones.  Relaying and serving agents
+        are not required to use these strategies, only to meet the
+        requirement of not accepting an article more than once.  However,
+        these strategies are safe and widely deployed and implementors are
+        encouraged to use one of them, especially if they not have
nit: "not have" - > "do not have".
+        extensive experience with Netnews and the subtle effects of its
+        flood-fill algorithm.</t>
+      </section>
+
       <section anchor="posting" title="Duties of a Posting Agent">
         <t>A posting agent is the component of a user agent that assists a
         poster in creating a valid proto-article and forwarding it to an