[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: #1416 Injection-Date: proposed diff



In <873azoqdqo.fsf@xxxxxxxxxxxxxxxxxxxxx> Russ Allbery <rra@xxxxxxxxxxxx> writes:

>It's been a while since we discussed this, so here again is my current
>proposed diff.  If there are parts of this we can agree on independently,
>I can start merging it into the main draft.

Yes, large chunks of this are fine, though I have a few nits (see later).

The main outstanding matter is Issue #1416, which I shall now try to
summarize as I see it.

Essentially, we have to choose between two options, which we have in the
past named "IR" and "IC". The current text reflects IR. The differences
relate to when an Injection-Date header MUST/MUST NOT/Whatever be added by
Posting and Injection agants.

Option IR
---------

Adding Injection-Date by Posting agent, according to headers present in
proto-article and intention wrt multiple injection:

						MUST	MUSTNOT	SHOULD	MAY
Multiple injection (also requires Msgid & Date)	YES
"Reinjection" (aka "gatewaying after the fact")		YES
Message-ID and stale Date present				YES
Message-ID _and_ Date present						YES
Message-ID or Date or both absent					???

[Where I believe "???" should be "YES", but text doesn't actually say so]

Adding Injection-Date by Injecting agent, according to headers present in
incoming proto-article:

						MUST	MUSTNOT	SHOULD	MAY
Injection-Date already present				YES
Message-ID _and_ Date present				YES
Message-ID or Date or both absent		YES

Option IC
---------

Adding Injection-Date by Posting agent, according to headers present in
proto-article and intention wrt multiple injection:

						MUST	MUSTNOT	SHOULD	MAY
Multiple injection (also requires Msgid & Date)	YES
"Reinjection" (aka "gatewaying after the fact")		YES
Message-ID and stale Date present				YES
All other cases (Message-ID or Date or neither)				YES

Adding Injection-Date by Injecting agent, according to headers present in
incoming proto-article:

						MUST	MUSTNOT	SHOULD	MAY
Injection-Date already present				YES
All other cases (Message-ID or Date or neither)	YES


Discussion of IR
----------------

The essential difference is the rule that Injecting agents MUST NOT add
Injection-Date if _both_ Message-ID and Date are already present, and MUST
add it otherwise. This rule is counter-intuitive and confusing.

A consequence of the rule is that it will forever remain the case that
some articles will never acquire an Injection-Date (those where the
Posting agent provides both of Message-ID and Date).

Russ counters this by claiming that, in time, Posting agents will learn to
add the Injection-Date themselves, but this is only a "MAY" as specified,
and I remain unconvinced that implementors will take the hint.

So if we decide to remain with Option IR, I would want to see a much
stronger wording in place of that "MAY" (not necessarily a "SHOULD", but
some indication that it was intended to become standard practice).

Moreover, there is a danger that Posting agent implementors will do it
wrongly, adding the Injection-Date _before_ they are sure they have a
working connection to an Injecting agent. Yes, multiple injectors have to
get this right, but people doing multiple injection can be expected to
have a higher level of "clue".

Discussion of IC
----------------

We distinguish between "OLD" agents which are unaware of our new standard,
and "NEW" agents which implement it as written.

The following scenario, which I understand to be the case Russ is worried
about, illustrates the problem with IC. If anyone knows of a different, or
a simpler, scenario that exhibits the same problem, then please speak up.

Day 0	OLD Posting agent P composes an article (with Date: Day0) and some
	Message-ID. P is in the habit of multi-injecting.
Day 1	P injects the message to (OLD or NEW) Injecting agent A (the ACopy).
Day 2	P injects the message to NEW Injecting agent B (the BCopy); B adds
	Injection-Date: Day2.
meanwhile:
Day 1	The ACopy propagates rapidly and arrives at NEW Serving agent C,
	which stores it and puts it in its history file (which it
	normally retains for 7 days).
Day 2	The BCopy arrives at Relaying agent D, which promptly breaks and goes
	offline for 7 days (or otherwise causes that propagation delay).
Day 8	C removes the ACopy from its history file.
Day 9	D wakes up, embarks on a massive catchup, and releases the BCopy.
Day 9	The BCopy arrives at C, which observes that it is (just) within 7
	days of its Injection-Date, and so accepts and stores it again.

	Which is, of course, a Bad Thing. C's users will say "Didn't I
	already see that article last week?" But it is no worse than that;
	in particular, I cannot see any way that looping could arise.

Observe that this Bad Thing would not have happened if:
	. B had been an OLD agent
	. C had been an OLD agent
	. P had been a NEW agent
	. The delay at D had been 1 day less
	. The delay at D had been 1 day more

So yes, the Bad Thing would not have happened on the current network, and
it will not happen when the whole network is NEW - so it is a transitional
problem whilst OLD and NEW agents are still around. Moreover, you might
think the whole scenario is somewhat artificial (hence the invitation to
propose more realistic scenarios).

So the choice you people have to make is between this possible, but rare,
Bad Thing, and the counter-intuitive properties of option IR.


The following are the paragraphs relating to this issue, and which would
need changing if we move to Opion IC. ALso a few nit-fixes in some of
them.

In Duties of a Posting Agent:

>+        <t>If the proto-article already contains both Message-ID and Date
>+        header fields, posting agents MAY add an Injection-Date header
                                           ^
                       , as part of the injection process,
>+        field to that proto-article immediately before passing that
>+        proto-article to an injection agent.

That is where I don't see why they MAY add it only if both Message-ID and
Date are present (see "???" above). What harm arises, even in IR, if they
are allowed to add it anyway?

>+        ...... They SHOULD do so if the
>+        Date header field (representing the composition time of the
>+        proto-article) is more than a day in the past at the time of
>+        injection.  If the proto-article is being submitted to more than
                      ^^
              They MUST do so if
>+        one injecting agent, see <xref target="multi-injection" />.</t>

Yes, that last MUST is redundant, but the point is important and
you have a similar redundancy when discussing proto-articles further on.

In Multiple Injection of Articles:

>+          <t>Whenever possible, multiple injection SHOULD be done by
>+          offering the same proto-article to multiple injecting agents.
>+          The posting agent MUST supply the Message-ID, Date, and
>+          Injection-Date header fields, and the proto-article as offered
                                                        ^^^^^^^
                                                        articles
>+          to each injecting agent MUST be identical.</t>

>+          <t>In some cases, offering the same proto-article to all
>+          injecting agents may not be possible (such as when gatewaying,
>+          after the fact, articles found on one Netnews network to
>+          another, supposedly unconnected one).  In this case, the posting
>+          agent MUST convert the article back into a proto-article before
>+          passing it to another injecting agent, but it MUST retain
>+          unmodified the Message-ID, Date, and Injection-Date header
>+          fields.  It MUST NOT add an Injection-Date header field if it is
>+          missing from the existing article.  It MUST remove any Xref
>+          header field and either rename or remove any Injection-Info
>+          header field and other trace fields.

Technically, that is what we used to call "reinjection". Now it is called
"Gatewaying after the fact", which is a bit of a mouthfull. Neither term
really indicates that the person/entity doing it may be completely unaware
that either multiple- or re-injection is taking place. Also, I am not sure
that "converting it back into a proto-article" is quite what is happening.
What is important is that any Message-ID, Date and Injection-Date MUST
stay, and any Xref and Injection-Info MUST go.

Also, it is not clear what is to happen to any Path. My preference would
be to leave it (see discussion of this further down), or otherwise to
rename it.

In Duties of an Injecting Agent:

>+            <t>If the proto-article already had an Injection-Date header
>+            field, it MUST NOT be modified or replaced.  If the
>+            proto-article had both a Message-ID header field and a Date
>+            header field, an Injection-Date header field MUST NOT be
>+            added, since the proto-article may have been multiply injected
>+            by a posting agent that predates this standard.  Otherwise,
>+            the injecting agent MUST add an Injection-Date header field
>+            containing the current date and time.</t>

In Duties of a Relaying Agent:

>             <t>It MUST examine the Injection-Date header field or, if
>             absent, the Date header field, and reject the article if that
>+            date is more than 24 hours into the future.  It MAY reject
>+            articles with dates in the future with a smaller margin than
>+            24 hours.</t>
>+
>+            <t>It MUST reject any article that has already been accepted.
>+            If it implements the mechanism described in <xref
>+            target="history" />, this means that it MUST reject any
>+            article whose date falls outside the cutoff interval since it
>+            won't know whether such articles had been accepted previously
>+            or not.</t>


Those are the texts relevant to the IR/IC issue. Here now are my remaining
nits - mostly typos or minor wording suggestions, although there is a
smallish issue related to POSTED in the Path header.



>@@ -452,6 +455,66 @@
>         </section>
>       </section>
> 
>+      <section anchor="history"
>+               title="Article History and Duplicate Suppression">
>+        <t>Netnews normally uses a flood-fill algorithm for propagation of
>+        articles in which each news server offers articles it accepts to
                                                   ^
                                                  the
>+        multiple peers and each news server may be offered the same
>+        article from multiple other news servers....
>+
>+          <list style="symbols">
>+            <t>Agents MAY select a cutoff interval and reject any article
>+            with a date farther in the past than that cutoff interval.  If
>+            this interval is shorter than the time it takes for an article
>+            to propagate through the network, the agent may reject an
                                                          ^^^
                                                         might
>+            article it had not yet seen, so it ought not be aggressively
                                                          ^
                                                          to
>+            short....

>+        <t>These are just two implementation strategies for article
>+        history, albeit the most common ones.  Relaying and serving agents
>+        are not required to use these strategies, only to meet the
>+        requirement of not accepting an article more than once.  However,
>+        these strategies are safe and widely deployed and implementors are
>+        encouraged to use one of them, especially if they not have
                                                           ^
                                                           do
>+        extensive experience with Netnews and the subtle effects of its
                                                ^^^
                                            with the more
>+        flood-fill algorithm.</t>

>@@ -459,9 +522,33 @@
> 
>+        applicable policies.  They MUST NOT create any Injection-Info
>+        header field; this header field will be added by the injecting
                                          ^^^^^^^
                                       is only to be
>+        agent.</t>
>+
>+        <t>The Injection-Date header field is new in this revision of the
>+        Netnews protocol and is designed to allow the Date header field to
>+        hold the composition date (as recommended in section 3.6.1 of
>+        <xref target="RFC2822" />), even if the proto-article is not
                                                                      ^
                                                                    to be
>+        injected for some time after its composition....

>@@ -484,48 +571,73 @@
>           agent.</t>
> 
>           <t>A proto-article has the same format as a normal article
>-          except that the Injection-Date, Injection-Info, and Xref header
>-          fields MUST NOT be present; the Path header field MUST NOT
>-          contain a "POSTED" &lt;diag-keyword>; and any of the following
>-          mandatory header fields MAY be omitted: Message-ID, Date, and
>-          Path.  In all other respects, a proto-article MUST be a valid
>-          Netnews article.  In particular, the header fields which may be
>-          omitted MUST NOT be present with invalid content.</t>
>+          except that the Injection-Info and Xref header fields MUST NOT
>+          be present; the Path header field MUST NOT contain a "POSTED"
>+          &lt;diag-keyword>; and any of the following mandatory header
>+          fields MAY be omitted: Message-ID, Date, and Path.  In all other
>+          respects, a proto-article MUST be a valid Netnews article.  In
>+          particular, the header fields which may be omitted MUST NOT be
>+          present with invalid content.</t>

I would prefer a "SHOULD NOT contain a "POSTED" &lt;diag-keyword>". No
interoperability arises if a "POSTED" appears twice in a Path, though it
may well be a cause for suspicion, and even for over-zealous agents to
reject it.

It can arise in the following circumstances:

1. As a result of "reinjecting " (aka "gatewaying after the fact").
2. In some forms of multiple injection, notably when one of the
"injections" is technically a relaying, using IHAVE. Such multiple
injection would indeed violate one of your SHOULDs, but SHOULD violations
are going to happen from time to time (and sometimes for good reason).
3. When some s[pc]ammer has preloaded the Path in order to disguise the
true origin of the article (indeed, this was one of the original purposes
for inventing POSTED). A pretty stupid s[pc]ammer, or course (competent
ones will just post from alt.net :-) ).

Generally, I prefer to preserve all evidence from earlier Paths, for the
netkops to interpret as they will (whether or not this results in multiple
POSTEDs). Alternatively, I might be persuaded that we should recommend
renaming earlier Paths, as we do for pre-existing Injection-Infos.

>           <t>If a posting agent intends to offer the same proto-article to
>+          multiple injecting agents, the header fields Message-ID, Date,
>+          and Injection-Date MUST be present and identical in all copies
>+          of the proto-article.  See <xref target="multi-injection" />.</t>


That is the place I mentioned earlier where you already have a redundant
MUST for Injection-Date. Fine, it helps make things clearer.

>+        <section anchor="multi-injection"
>+                 title="Multiple Injection of Articles">
>+          <t>Under some circumstances (posting to multiple disjoint
                                                            ^
                                                        supposedly
>+          networks, injecting agents with spotty connectivity, or for
>+          redundancy, for example), a posting agent may wish to offer the
>+          same article to multiple injecting agents. ....

>+          <t>Multiple injection of an article listing one or more
>+          moderated newsgroups in its Newsgroups header field SHOULD only
                                                                ^^^^^^^^^^^
                                                                SHOULD ONLY
>+          be done by a moderator and MUST only be done after the
                                       ^^^^^^^^^
                                       MUST ONLY
>+          proto-article is approved for all moderated groups to which it
                          ^^
                      has been
>+          is to be posted and has an Approved header field (see <xref
>+          target="moderator" />)....

>@@ -650,23 +762,27 @@
> 
>             <t>It MUST reject any proto-article that does not have the
>+            Injection-Info or Xref header fields; that has a Path header
>+            field containing the "POSTED" &lt;diag-keyword>; or that is
>+            not syntactically valid as defined by <xref target="USEFOR"
>+            />.  It SHOULD reject any proto-article which contains a
>+            header field deprecated for Netnews (see, for example, <xref
>+            target="RFC3798" />).  It MAY reject any proto-article that
>+            contains trace header fields (e.g., NNTP-Posting-Host)
>+            indicating that it was already injected by an injecting agent
>+            that did not add Injection-Info or Injection-Date.</t>

I would prefer POSTED to be in the "SHOULD reject"s, for reasons already
stated above.
>+
>+            ...  It SHOULD similarly
>+            reject any article whose Date header field is too far in the
>+            past, since not all news servers support Injection-Date and
>+            only the injecting agent can provide a useful error message to
>+            the posting agent.  In either case, this interval SHOULD NOT
>+            be any shorter than 72 hours into the past.</t>

That SHOULD is better than the MUST that was present in earlier drafts. I
would still be happier with a MAY.
 
>@@ -806,18 +928,18 @@

>+            <t>It MUST reject any article that has already been accepted.
>+            If it implements the mechanism described in <xref
                               ^^^^^^^^^^^^^
                          one of the mechanisms
>+            target="history" />, this means that it MUST reject any
>+            article whose date falls outside the cutoff interval since it
>+            won't know whether such articles had been accepted previously
>+            or not.</t>

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl@xxxxxxxxxxxxxxxx      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5