[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: 5. "Oscillation"



[Ron]

I don't accept your proof. My eyes are URP-glazed but I think you proof
fails from common sense alone.

Your proof depends on ordering of CSNs, as if they can be ordered. Yet,
presumably, the CSNs are originated by the source. They will, therefore, be
capable of ordering in general, but clashes may occur. To construct a
counter-example, you will probably set the CSNs to all the changes to be the
same, say 7.

Of course, mostly the CSNs will be different and this will produce a
deterministic result. Semantically, it feels odd as the server with the
highest CSNs will, in effect, arbitrate the result.

[Albert]
The URP CSN is defined with four components as specified in the architecture
draft, 4.5.1:

"In order of significance they are; the time, a change count, a Replica
Identifier, and a modification number. The CSN is composed thus to ensure
the uniqueness of every generated CSN."

I believe that clearly does ensure the uniqueness of every generated CSN as
intended. The three components time, change count and modification number
ensure uniqueness within a replica, and the remaining component, replica
identifuer, ensures global uniqueness within the replication set.

Therefore no counter-example could be based on CSNs for different changes
having the same value.

4.5.1 continues: "When CSNs are compared to determine their ordering, they
are compared component by component. First the time, then the change count,
then the replica identifier, and finally the modification number."

This does establish a strict ordering of CSNs. Whenever the other 3
components of two CSNs generated at different replicas happen to have the
same time, change count and modification number, the result is arbitrarily
determined by the replica with the highest replica identifier. I agree that
"feels odd", but it does guarantee convergence.

In my view it is a clumsy and confusing way to do it, but that isn't worth
arguing about at this late stage, as long as it does do it. The "change
count" is really an internal implementation detail for coordinating multiple
threads within a DSA, and need not be carried by protocol at all. MDCR just
uses the timestamp plus version number to guarantee uniqueness of USNs
generated by a DSA (for a single entryID).

The difference is simply that URP gives priority in the lexical ordering to
the combined timestamp represented by the time and change count, while MDCR
gives priority to the version number. Both apply the last component, replica
identifier in the same way to guarantee uniqueness among all DSAs and
arbitrarily impose a strict ordering. However MDCR applies this only as a
last resort and treats the version number as the principal determinant of
priority among concurrent changes. In practice the fact that URP applies
that arbitrary last resort slightly earlier does not really matter as the
whole ordering is arbitrary anyway and the remaining three components will
almost always be unique.

The precise ordering of timestamps generated for concurrent changes at
different replicas is inherently arbitrary as clocks are not synchronized.
All it creates is the *illusion* of time ordering. The reality is that
changes are concurrent, not serialized, when made at different replicas to
the same entry state. It doesn't really matter which change succeeds as long
as no more than one of them does.

This seems a good point to mention a significant advantage of the
Coda/Active Directory method of using version numbers as the primary
determinant of priority for resolving conflicts among concurrent changes.

Typically, concurrent changes will have different timestamps. The fact that
occasionally they will have identical timestamps is focussed on by URP, thus
distracting attention from the fact that they are still concurrent, not
serialized, whether they have the same timestamp or not. The simple fact is
that each concurrent change was made without knowledge of the state
resulting from the other(s).

MDCR makes this explicit by recognizing concurrency whenever the version
numbers are different (or are the same but generated at different replicas,
hence the need for a tree).

A consequence is that regardless of timestamps, the final state of an entry
that is changed more frequently at a particular replica during a conflict
will ultimately prevail over conflicting changes made at other replicas.

Typically a conflict will simply occur between just two changes made at two
different replicas and MDCR will arbitrarily converge to the change made by
the replica with the highest replica number, regardless of timestamps.

But on the rare occasions when there is a non-trivial tree of conflicts,
because additional changes have been applied to the visible changed entries
at a replica before they become "durable", MDCR will recognize this and give
priority to the longest possible sequence of changes that could have been
serialized.

For example if two successive changes are made to a single replica at A,
while a concurrent single change is made at B, the result will converge to
the final state represented by the second change at A.

This is more significant than it seems. The entries that are being changed
most often are those being actively worked on. If an application is trying
to coordinate a transaction by first setting some sort of transaction
identifier on all the related entries, then changing them and all and coming
back to finalize the transaction, it is more likely to ultimately succeed
and become durable, than a conflicting single change made by another DUA
that is not part of a transaction.

Although the main point is to ensure that conflicts are actually resolved,
not merged. This is a significant optimization of the resolution. The Coda
research has full details.