[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: 5. "Oscillation"
Suppose I have two replicas, R1 and R2, and an application that
modifies a particular entry, E, at intervals of one hour.
One instance of the application sends its modify requests to R1 every
15 minutes past the hour. A second instance of the application
sends its modify requests to R2 every 45 minutes past the hour.
The replication agreements are set up such that R1 initiates a replication
session with R2, every hour on the hour, and R2 initiates a replication
session with R1, every hour on the half hour.
... and so it goes on, ad infinitum, with the two replicas growing two
independent strands in the change tree for E. Neither strand ever becoming
the single durable strand.
If you set the replication schedules the right way (R3 <-> R1 at h:20,
R3 <-> R2 at h:50) a third replica will alternate between these two strands.
I have omitted the details and confirm that MDCR would behave exactly as
you describe. This might be a useful example to add, perhaps in a more
tabular form, to help explain the details of the weeding and summarizing
mechanism as Ryan requested.
It is fairly obvious for ANY replication protocol that if an entry is being
changed continually it cannot converge to a common state.
That is not non-convergence, let alone oscillation. Non-convergence means
that some finite sequence of changes does not eventually result in a common
state within some bounded limit (which should be specified as the
delay), after no further changes are occurring.
Oscillation means that some finite sequence of changes can result in
continued indefinate alternation among two or more states. That is quite a
common problem in protocols, usually dealt with by hop counts or traces.
I guess your real point is rather that URP does not need any mechanism to
deal with situations like you have described, whereas MDCR does.
I agree that you have shown the current MDCR weeding and summarizing
mechanism would not prevent a strand just continuing to grow forever,
supporting Ryan's concern that large trees could become a problem.
In general, the circumstances in which that would occur are where
concurrent changes to an entry are *continually* happening within an
interval no greater than the convergence delay limit.
That is clearly a circumstance in which multi-master replication is being
used inappropriately by an application, and sysadmin intervention would be
required to fix it anyway, if the users didn't.
MDCR is reasonably robust about that. For example with 12 bytes tree
overhead in RAM, a strand could reach a length of 80,000 before occupying
With replication 100 times per day (more than 4 times the hourly interval
gave in the example), that would take more than 2 years.
I still think it is likely that any applications behaving like this would
be fixed anyway, before tree length became an issue, since they could not
achieve anything useful and are therefore unlikely to keep doing it
They only have to stop making pointless changes *occasionally* for long
and the strand would get summarized.
However as Murphy's law is the first law of protocol design, I agree that
issue should be dealt with, together with the full specification of
delays, if MDCR is added to the WG agenda. A specification of maximum tree
to trigger syadmin notification of a problem, or perhaps automated action,
be added, or included in general replica management standards.
> Here's a semi-formal proof that neither I nor anybody else
> will be able to
> come up with an example showing that URP also oscillates.
> 3. That entry state will replace any other it encounters and
> will therefore
> propagate to all replicas. (Leaving aside the fixable problem
> with relying
> on time stamps etc).
Your proof neglects the role of the purging mechanism. You have to show
that all replicas will get all changes in spite of the cleanup going on
in parallel, and in spite of the effects of restoring a replica from
Having shown that all replicas get all changes it is still necessary to
show that all possible processing orders of the change reports produce the
same outcome. The open-endedness of the list of changes can't be neglected
I explicitly qualified the proof that URP converges with the words "Leaving
aside the fixable problem with relying on time stamps etc".
I had already demonstrated that URP does NOT converge in item 4, precisely
because of the issues concerning the purging mechanism and the problem of
DSAs crashing and being restored from backups explained there:
An important reason why priority for version numbers rather than timestamps
is essential is to meet requirements for eventual convergence. The
architecture draft explicitly states that there is no mandatory requirement
for clock synchronization, which is reasonable because experience shows
clocks do go out of sync regardless (4.5.3). It goes on to state "If
timestamps are not accurate, and a server consistently produces timestamps
which are significantly older than those of other servers, its updates will
not have any effect and the real world time ordering of updates will not be
In 13. Time, the architecture draft states: "The server must reject update
operations, from any source, which would result in setting a CSN on an entry
or a value which is earlier than the one that is there".
The consequence of course is not just that "the real world time ordering of
updates will not be maintained". *That* is an inevitable consequence of the
fact that real world clocks are out of sync anyway. What will *ALSO* happen,
is that changes made by DUAs will be accepted by some DSAs (eg those
synchronized to the same clocks), and rejected by others, as explicitly
DSAs *do* crash, and *do* get restored from backups, but URP simply ignores
Consequently there will be no eventual convergence, but increasing
divergence, which will require manual sysadmin repairs to discover and fix
entries that were only partially propagated.
I was not attempting to refute myself.
It is YOU who would have to show various things in order to complete a proof
that URP converges with the present mechanism.
As I understand it, you have not yet responded directly to that, but have
tacitly admitted that you cannot do so by stating:
"I have a procedure for solving the replica consistency problems of
restored replicas and rejoined partitions but it is written in terms
of a log-based implementation using a different purging mechanism.
I'm still in the process of recasting it in state-based terms with
an update vector."
I have attempted a semi-formal proof that MDCR does converge, with "weeding
going on in parallel and despite network partitioning, DSA crashes and
restorations etc in
the full MDCR draft:
If you (or anyone else) can see a problem in that, please draw it to my