It's been a while since we discussed this, so here again is my current
proposed diff. If there are parts of this we can agree on independently,
I can start merging it into the main draft.
--- usepro.xml 2007-07-01 19:38:06.000000000 -0700
+++ usepro-1416.xml 2007-07-16 13:43:23.000000000 -0700
@@ -18,6 +18,8 @@
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2822.xml'>
<!ENTITY rfc3629 PUBLIC ''
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml'>
+ <!ENTITY rfc3798 PUBLIC ''
+ 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3798.xml'>
<!ENTITY rfc3977 PUBLIC ''
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3977.xml'>
<!ENTITY rfc4234 PUBLIC ''
@@ -165,8 +167,9 @@
<t>"Injecting" an article is the processing of a proto-article by
an injecting agent. Normally this action is done once and only
- once for a given article. "Reinjecting" an article is passing an
- already-injected article to an injection agent.</t>
+ once for a given article. "Multiple injection" is passing the
+ same article to multiple injecting agents, either serially or in
+ parallel.</t>
<t>A "gateway" is software which receives news articles and
converts them to messages of some other kind (such as <xref
@@ -452,6 +455,66 @@
</section>
</section>
+ <section anchor="history"
+ title="Article History and Duplicate Suppression">
+ <t>Netnews normally uses a flood-fill algorithm for propagation of
+ articles in which each news server offers articles it accepts to
+ multiple peers and each news server may be offered the same
+ article from multiple other news servers. Accordingly, duplicate
+ suppression is key; if a news server accepted every article it was
+ offered, it may needlessly accept (and then potentially
+ retransmit) dozens of copies of every article.</t>
+
+ <t>Relaying and serving agents therefore MUST keep a record of
+ articles they have already seen and use that record to reject
+ additional offers of the same article. This record is called the
+ "history" file or database.</t>
+
+ <t>Each article is uniquely identified by its message identifier,
+ so a relaying or serving agent could satisfy this requirement by
+ storing a record of every message identifier that agent has ever
+ seen. Such a history database would grow without bound, however,
+ so it is common and permitted to optimize based on the
+ Injection-Date or Date header field of an article as follows. (In
+ the following discussion, the "date" of an article is defined to
+ be the date represented by its Injection-Date header field if
+ present, otherwise its Date header field.)
+ <list style="symbols">
+ <t>Agents MAY select a cutoff interval and reject any article
+ with a date farther in the past than that cutoff interval. If
+ this interval is shorter than the time it takes for an article
+ to propagate through the network, the agent may reject an
+ article it had not yet seen, so it ought not be aggressively
+ short. For Usenet, for example, a cutoff interval of no less
+ than seven days is conventional.</t>
+
+ <t>Agents that enforce such a cutoff MAY then drop records of
+ articles that had dates older than the cutoff from their
+ history databases. If such an article were offered to the
+ agent again, it would be rejected due to the cutoff date, so
+ the history record is no longer required to suppress the
+ duplicate.</t>
+
+ <t>Alternatively, agents MAY drop history records according to
+ the date when the article was first seen by that agent rather
+ than the date of the article. In this case, the history
+ retention interval MUST be at least 24 hours longer than the
+ cutoff interval to allow for articles dated in the future.
+ This interval matches the allowable error in the date of the
+ article (see <xref target="injecting" />).</t>
+ </list>
+ </t>
+
+ <t>These are just two implementation strategies for article
+ history, albeit the most common ones. Relaying and serving agents
+ are not required to use these strategies, only to meet the
+ requirement of not accepting an article more than once. However,
+ these strategies are safe and widely deployed and implementors are
+ encouraged to use one of them, especially if they not have