[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
#1416: USEPRO 3.9: Reinjection and Injection-Date
Okay, I took a long walk to think about this (which I should have done
before sending the last message -- I'm sorry, Charles, I'm confused by all
the specific examples but after thinking about this on a walk, I can see
the general drive of what you're getting at). Let's take a step back and
look at the theory. I think it's the specific use cases that are mixing
things up.
First, let's go back to the Netnews protocol as it exists today and look
at how injection and reinjection work without Injection-Date.
Every article has a unique identity in its message identifier, but we
don't expect agents to retain a database of every message identifier.
Therefore, in practice, articles are uniquely identified by the
combination of message identifier and Date header field. If the Date
header field is recent, we can rely on a database lookup for the message
identifier to know if we've seen this article before. If the Date header
field is old, we may not know whether we've seen this article before or
not, so we assume we have. Therefore, an article's functional identity is
the Date and message identifier pair. Uniquely identifying an article is
how we prevent duplicating that article on the network.
Now, in the current Netnews protocol, an injecting agent is responsible
for ensuring that every article has a unique identity by adding
Message-ID and Date header fields if they're not present in a
proto-article. However, the posting agent MAY assert the identity of the
article by filling out the Message-ID and Date header fields in advance,
in which case the injecting agent MUST use the existing identity of the
article.
In any case in which multiple injection happens, either in parallel
(submitting the same proto-article to multiple injecting agents) or in
serial (reinjection), the identity of the article must be preserved to
avoid the possibility of duplicating it on the network. So long as the
identity of the article is preserved, every agent on the network can
independently be assured that it won't accept a duplicate of an article.
This independent verification is important because it allows for multiple
independent paths of propagation.
Note that for proto-articles, this guarantee only holds if the posting
agent provides the identity of the article. Therefore, in the existing
Netnews protocol, it's best practice for a posting agent injecting an
article at multiple injection agents to provide the complete identity of
the article (both Message-ID and Date header fields) so that every copy of
the article has the same identity and no agent will accept multiple copies
of it. Reinjection similarly can be assured of not creating loops simply
by preserving the identity of the article (both Message-ID and Date header
fields) when reinjecting.
Note, however, that this means that the full propagation of a message is
constrained to only those sites that it can reach prior to the expiration
of its Date header field. So currently, reinjection breaks down if the
reinjection cannot happen within the staleness period of the Date header
(and it breaks down differently at different sites, so some sites in a
network may accept a delayed reinjected article and others may reject it).
This is the same as for relaying. The only difference is that reinjecting
agents are generally the sole path to reach a particular site, so if the
reinjection is delayed, articles go missing. Because of this, some
reinjecting agents change the identity of the message by regenerating the
Date header. More on the risks of that in a moment.
Now, the problem with this is that the Date header field carries two
meanings. It's part of the identity of the article, and it's also
human-readable information about when the message was composed. Some
posting agents always treat it as the former and either let the injecting
agent generate it or only generate it at injection time. Some posting
agents treat it as the latter and generate it at message composition time.
This means that proto-articles that are held for a time before injection
will either lose information about when the message was composed or will
run the risk of being rejected by servers because their Date header field
is stale.
The working group therefore introduced a new header, Injection-Date, which
serves *only* the protocol function and separates the protocol function
from the user presentation function. Date is now of interest only to
humans and contains the composition time, which serves no protocol
function. Injection-Date is part of the identity of the message.
However, as part of this change, we also changed who was responsible for
establishing the identity of a message. Now, the injecting agent *always*
establishes the identity of a proto-article, and the posting agent isn't
permitted to do so. Only when a proto-article is injected does it acquire
a unique identity in practice; prior to that, it will only have a message
identifier, which is not sufficient alone to prevent duplication of the
message.
This isn't a problem only for reinjection. It's a problem for every case
where an article is touched by multiple injecting agents. In particular,
since the introduction of Injection-Date, a proto-article that's injected
at multiple injecting agents will be assigned a slightly different
identity by each one. In the normal case, these identities will only vary
by seconds or at most minutes, which in practice is highly unlikely to
cause problems, but theoretically we still broke the identity model.
Since the introduction of Injection-Date, time-delayed multiple injection
is no longer safe against duplication. If a posting agent injects an
proto-article at one injecting agent immediately and then at another
injecting agent some days later, it may introduce a duplicate article into
the network, and there's nothing the posting agent can do except not do
time-delayed multiple injection to prevent this. Prior to Injection-Date,
it could make sure that all copies had the same identity and the later
injection would never duplicate, just possibly suffer from poor
propagation because it's stale.
Of course, serial multiple injection, reinjection, suffers the most, since
it is the most likely to happen some time later. So in usepro-06,
reinjection was distinguished from injection and only in the reinjection
case (as determined by the injecting agent) was the article permitted to
retain its original Injection-Date. In other words, normal posting agents
were still not permitted to assert the complete article identity, but a
special type of posting agent was if an injecting agent chose to let it.
So, we have some competing goals:
* We want every copy of an article to have the same identity so that
duplicates can be effectively suppressed at any node of the network
without requiring reliance on prior nodes to do verification in any
particular way. This is robust against multiple propagation paths.
* We can't rely on the Date provided by a posting agent to be part of
the article's identity because then we'll give some articles too old
of a date and fail to propagate them. In other words, we have to
treat a Date header field provided by a posting agent as a comment
rather than a protocol element.
* We want to support passing the same proto-article to multiple injecting
agents and reprocessing an article through an injecting agent because
both are useful in practice in some specific situations and both have
historically been supported by Netnews.
So far, we have a couple of different solutions.
* The current draft, my approach, basically takes the stance of "well, we
asked for it, so we take the consequences." Multiply injected articles
simply have different identities -- that's the consequence of how we
defined Injection-Date. Each time an article passes through an
injecting agent, its identity changes, and in the case where it's most
likely that identity change will be serious (reinjection), the agent
doing the reinjection is required to verify as well as possible that it
won't create the chance of duplicates. In practice, the drawbacks are
hopefully limited. There's no way for that check to be perfect, but
hopefully it's no worse than the other case of multiple injection that
we're already dealing with. As a nice side effect, this also means
that delays in reinjection don't cause articles to be lost.
* usepro-06 makes reinjection a special case with a different set of
rules that require that the identity of the article be preserved. In
other to do this, negotiation between the posting agent and the
injecting agent is required and software has to know whether it's doing
injection or reinjection. There are some follow-on effects for trace
headers, but this could be separated out and handled differently. The
core point is that reinjection preserves article identity, but in the
process maintains the current drawback that delays in reinjection may
cause articles to be lost. There may be a bit of an attractive
nuisance here, the same one that's present in the existing protocol,
for reinjecting agents to drop the Injection-Date header anyway so as
to not lose articles, thereby reintroducing the problems the current
draft causes.
Forrest has proposed only allowing reinjection between disjoint networks.
If the reinjecting agent is the only possible path between two otherwise
disjoint networks, then the reinjecting agent can do article identity
verification for the second network and it doesn't matter if the article
acquires a new identity when crossing that network boundary. The
difficulty here, as Charles correctly points out, is that there's no
general way of establishing that two networks are disjoint. One can
sometimes determine for individual articles that they passed through a
host on network A and on network B and therefore the networks are not
disjoint, but to be sure would require full knowledge of at least one of
the networks. There are some common cases where this is possible, but
it's not possible in general, and it's possible to think that you know the
networks are disjoint and be wrong.
There is another solution that require modifying USEFOR:
* Drop Injection-Date entirely and go back to the current protocol. This
solves this problem and reintroduces the problem that we were trying to
fix originally.
There is another solution that arguably requires modifying USEFOR:
* Change the definition of the Injection-Date header field so that the
posting agent can provide it, and indeed MUST provide it if they wish
Date to be treated as a comment rather than a protocol element.
Require posting agents doing multiple injection to include either a
Date or (preferrably) a Posting-Date header. Don't allow injecting
agents to set Injection-Date at all if Date was provided. This has the
advantage of solving the multiple injection problem (present in both my
draft and in usepro-06), and allows reinjecting agents to assert the
same article identity in the same way as is possible with the current
protocol. Basically, this would make Injection-Date *entirely* the
same as the protocol purpose of the current Date header, rather than
mostly the same. It has the same difficulties with delayed
reinjection.
One could make the argument that we could do this now, that nothing in the
current USEFOR definition says that the posting agent can't provide this
header. The current text implies it, but doesn't say so outright. The
header name is confusing if the posting agent is providing it, but we
could live with that.
The drawback of this approach, of course, is that it still doesn't deal
with the issue of delayed reinjection. Some reinjecting agents are going
to want to change the identity of a message so that it will still be
accepted even though it's old. Doing so inherently runs the risk of
creating duplicates, but doing so is going to be a common desire --
gateways go down or off-line for a while, networks go out, people want to
pull down older news to bootstrap or seed a new Netnews network, etc. If
we just outlaw it, we have no control over how people do it if they choose
to break the protocol. If we can come up with something useful to say
about *how* to change the identity of the message, maybe we can reduce the
risk.
That's the summary of the current situation as I see it. I think it's the
most complex problem we have left to resolve before we have a publishable
draft. Every approach to this problem that I can see introduces risk
somewhere, including sticking with the current protocol and thereby
tacitly encouraging people to drop the Date header when they need to. My
approach in the current draft is a mess, but so is every other approach
that I can think of. It doesn't, however, deal with multiple injection,
which would be valuable.
If we want to pursue the last option mentioned above, namely requiring the
posting agent to provide Injection-Date if it wants Date to be ignored
(ideally renaming and slightly redefining the header in the process), I
can propose some text, but we still need to decide what, if anything, we
want to say about delayed reinjection. I think this route is cleaner than
my current draft, and it takes an important step back towards Charles's
draft in that we would then require Injection-Date not be changed by the
injecting agent if already present (modulo whatever we wanted to say about
intentionally changing article identity).
--
Russ Allbery (rra@xxxxxxxxxxxx) <http://www.eyrie.org/~eagle/>