[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

4. "Eventual Convergence - Version numbers or timestamps".



Steve,

Here's the response to item 4, which I summarized as:

4. "Eventual Convergence - Version numbers or timestamps". Agreement that
URP could operate with priority for either version numbers or timestamps. I
maintain that version numbers should have priority, for reasons fully
explained in the Coda file system research. If URP remains the only protocol
under consideration by the WG it can and should be changed to do so. Design
issue, not requirements (though relevant to any requirement for eventual
convergence - see below).

Implicit agreement that in the absence of further changes, all replicas
should converge to the same state and any protocol that does not guarantee
this is unacceptable. Disagreement as to whether each of MDCR and URP meet
that requirement. Steve maintains that MDCR cannot meet that requirement. I
maintain that MDCR can and that URP as currently drafted does not, but that
URP is capable of doing so, provided that version numbers have priority over
timestamps. Although there seems to be no disagreement about requirements on
this between Steve and myself, I cannot resist  pointing out that the
current requirements draft actually "requires" a "flexible" ability to
accommodate both models that guarantee convergence and models that will
diverge.

http://www.imc.org/ietf-ldup/mail-archive/msg00614.html

***
Comments are interspersed with original below.
***
[snip]
[Albert]
> Incidentally the MDCR draft also proposes adoption of the Coda report
> propagation protocols  adopted by AD. I believe that would be
> a considerable
> improvement without requiring any substantial change to either the
> requirements or architecture and especially beneficial to the
> existing URP
> design as there is a lot of avoidable complication there
> simply because
> reports are propagated out of sequence by each replica.

[Steve]
Propagating the changes strictly in CSN order wouldn't make much difference.
The "complexity" arises from the requirement to process local changes out
of order with updates reported by other replicas. Whether those remote
changes are reported in order becomes irrelevant.

>
> That is completely independent of the issue of atomicity.
>

[Albert]
Well, if you believe that all the complex "Extraordinary States" and
"Transient Extraordinary States" are inherent in URP and cannot be removed,
I'm not going to argue. I certainly agree that the central problem for any
replication protocol arises from changes arriving out of order from
different replicas rather than from reports being re-ordered when
propagated.

But that is simply because the former is unavoidable, while the latter can
be easily avoided by simply forwarding them in the same order that they are
received - as specified in MDCR, based on the work done for Coda and
implemented in Active Directory.

Propagation in order of receipt ensures that the MDCR "seenMarks"
corresponding to the URP "purge vectors" can accurately reflect the fact
that *all* prior change reports to those listed have been propagated.
Purging cannot occur until *after* that has occurred. This guarantees that
when DSAs crash and get restored from backups etc only the delay for
"durability" of changes at other replicas, and not the integrity of the
replication itself is affected.

An important reason why priority for version numbers rather than timestamps
is essential is to meet requirements for eventual convergence. The
architecture draft explicitly states that there is no mandatory requirement
for clock synchronization, which is reasonable because experience shows
clocks do go out of sync regardless (4.5.3). It goes on to state "If
timestamps are not accurate, and a server consistently produces timestamps
which are significantly older than those of other servers, its updates will
not have any effect and the real world time ordering of updates will not be
maintained."

In 13. Time, the architecture draft states: "The server must reject update
operations, from any source, which would result in setting a CSN on an entry
or a value which is earlier than the one that is there".

The consequence of course is not just that "the real world time ordering of
updates will not be maintained". *That* is an inevitable consequence of the
fact that real world clocks are out of sync anyway. What will *ALSO* happen,
is that changes made by DUAs will be accepted by some DSAs (eg those
synchronized to the same clocks), and rejected by others, as explicitly
required.

DSAs *do* crash, and *do* get restored from backups, but URP simply ignores
the consequences.

Consequently there will be no eventual convergence, but increasing
divergence, which will require manual sysadmin repairs to discover and fix
entries that were only partially propagated.

What 4.5.3 of the architecture draft *should* have added is "Consequently we
are not relying on timestamps but giving priority to version numbers".

> [RM]
> 2. The whole "weeding and summarizing" discussion left me
> confused and therefore worried.  There seems to be an
> unstated assumption that all replicas are "well-connected".
> My concern is that if one or more replicas are
> "sporadically-connected" then the size of the trees could
> become an issue.  Again, I may be missing something in the
> draft, but if so, I'd claim it is buried.  Because the
> whole discussion left me confused, I think some clarification
> in the form of an example would help.
>
> Ryan
>
> [AL] If the WG accepts the draft as being on its agenda for
> discussion I
> think a lot more work will be needed on it, including
> detailed examples (and
> diagrams) for the weeding and summarizing process as you suggest.

[Steven]
I too have some concerns about the "weeding and summarizing" stuff.
In particular it is not possible to guarantee that all replicas will
converge toward the same state. A version of an entry cannot be made
"Durable" until after some unspecified delay to see if any higher version
shows up. For an entry version to become the definitive (durable) version
depends not only on earlier events but also on events that are yet to occur!
If the waiting time is chosen badly the replicas will quickly diverge.
No matter what delay is chosen there is always a chance that a higher
version will pop up immediately afterward.

[Albert]
Partly true, except that the replicas will not diverge because all reports
are still propagated to all replicas and will be treated identically at all
of them. MDCR makes use of the "seenMarks" to determine the propagation time
to all other replicas. As I mentioned in the draft, a formula does need to
be specified as to how to calculate what additional time should be allowed,
and that formula needs to leave a generous margin. But all that is necessary
is to specify, for all replicas in a replication area, a sufficiently
generous margin. Any stragglers that do arrive would still have higher
version numbers and could just be applied to ensure eventual convergence
while not maintaining the guarantee of conflict detection and resolution, ie
it would behave as badly as URP but only for stragglers instead of always.
Alternatively, more complex provisions could be added. Details should be
left until MDCR is actually on the agenda, but I would prefer to simply mark
stragglers for manual fixup, like modifyDN, create and add conflicts, as the
numbers should be negligible. This is VERY different from requiring manual
fixups whenever clocks go out of sync, or DSAs crash and get restored from
backups, which they do all the time.

[...]
> URP makes the fatal mistake of splitting
> the operations
> into primitives so as to effectively merge conflicts instead
> of resolve
> them, and adds the serious mistake of giving higher priority
> to timestamps
> than to version numbers.

[Steven]
The LDUP CSNs can be used either way. URP doesn't care, it just sees
a monotonically increasing value.

[Albert]
Agreed. That is why I distinguished it from what I regard as the fatal
mistake re atomicity. It is not in any way essential to the design of URP
that it rely on timestamps instead of giving priority to version numbers, or
that replicas be allowed to propagate reports out of order. This defect that
would otherwise result in eventual divergence can easily be fixed by simply
adopting the Coda based report propagation protocols suggested in MDCR while
retaining the essence of the URP design.

So why not just fix it?