[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: various comments (was Re: What's going on? - Status of Requirements...)



Albert,

> -----Original Message-----
> From: owner-ietf-ldup@xxxxxxxxxxxx
> [mailto:owner-ietf-ldup@xxxxxxxxxxxx]On Behalf Of Albert Langer
> Sent: Tuesday, 20 June 2000 8:25
> To: 'Ryan Moats'
> Cc: ietf-ldup@xxxxxxx
> Subject: RE: various comments (was Re: What's going on? - Status of
> Requirements...)

[snip]

> [RM]
> On the subject of the requirements / URP draft:
>
> 1. I still believe the requirements draft should be more
> explicit about atomicity of operations being maintained
> across replication.  In my various re-readings of this
> draft, I have at times found justification for both
> sides (maintain and do not maintain) and I still think
> that maintaining atomicity is necessary.
>
> 2. The URP draft should be explicit in how it maintains
> atomicity of operations across replication.  I'm pretty
> sure from my last perusal it doesn't now.
>
> [AL]
> Understood.
> URP certainly does not maintain atomicity and is explicit
> about that. This
> problem is not in the URP draft, but in the requirements
> draft. If there was
> a requirement to maintain atomicity there would be a
> completely different
> URP (not necessarily MDCR).

Atomicity is not the right concept to be arguing about in a multimaster
replication environment. Consider this example.

At server S1 at time t1 a user modify request on an entry adds attribute
A1 and replaces the existing values of attribute A2. At server S2 at
time t2 (t2 > t1) a user modify request replaces attribute A2. Some time
later the modify from server S1 arrives at S2. If S2 performs this
operation atomically it will replace the newer (time t2) value(s) of
attribute A2 with the older values (time t1). This is clearly the wrong
thing to do. If S2 adds the new attribute A1 but ignores the replacement
of A2 (as URP would do) then it produces the same outcome as the
serial execution of the two modify requests, though the action doesn't
fit the usual definition of "atomic".

If we are going to discuss atomicity in replication then we need a more
meaningful definition. URP makes all the replicas produce an outcome
that is equivalent to the serial atomic execution of all the updates.
That is as atomic as it needs to be.

The real question revolves around the preconditions of a user update
request. If we look at the equivalent serial processing of a collection
of updates performed at two or more multimaster replicas then some of those
updates will be "executed" against a different database state than actually
existed at the time they were really performed by one of the replicas.
The URP philosophy is that most of the time for most applications it doesn't
really matter. The MDCR philosophy is that it is better to completely
disregard the user's update, after the fact, just in case the state of the
entry did matter.

[snip]

> [RM]
> 3. Once URP maintains change atomicity, the "modifiersName"
> issue in my mind goes away.  There may be others that
> still remain...
>
> [AL]
> Agreed. "modifiersName" is impossible without atomicity, and
> easily achieved
> with it (as are many other things).

You seem to be assuming that the DSA maintained operational attributes
are being independently maintained by each replica. The intent in URP is
that the replica executing the user update request will modify the
operational attributes, like modifiersName and modifyTimestamp, as
required, and that these changes will also be reflected in the primitives
sent to the other replicas. Those other replicas will keep the values
of these operational attributes with the highest CSNs, which will
correspond to the latest change to the user attributes. Exactly the
same outcome as a serial execution of the updates in CSN order.

[snip]

> As regards efficient propagation of the tree via the report
> propagation
> protocol I believe the encoding on the wire is as efficient
> and as simple as
> possible by just conveying the actual LDAP protocolOps for
> the changes. The
> only redundancy is for the relatively small proportion of changes that
> affect the Directory Information Tree (DIT), ie add, del and modifyDN,
> rather than just the Directory Information Base (DIT), ie modify (for
> changes to non-distinguished attributes and values).

The replicating servers aren't necessarily LDAP or X.500 DSAs, so the update
protocol operations aren't necessarily just LDAP, DAP or DSP operations.
Also, these protocols are still subject to change so additional operations
may be defined in the future, or existing ones extended. LDAP also has
a means for vendors to define proprietary operations. We can't expect LDUP
implementors to cover all the possibilities.

The original request also doesn't carry the changes to operational
attributes or other vendor specific DSA invoked changes such as might
occur to maintain referential integrity of DNs.

Breaking all update requests into replication primitives gets around these
problems.

[snip]

> Incidentally the MDCR draft also proposes adoption of the Coda report
> propagation protocols  adopted by AD. I believe that would be
> a considerable
> improvement without requiring any substantial change to either the
> requirements or architecture and especially beneficial to the
> existing URP
> design as there is a lot of avoidable complication there
> simply because
> reports are propagated out of sequence by each replica.

Propagating the changes strictly in CSN order wouldn't make much difference.
The "complexity" arises from the requirement to process local changes out
of order with updates reported by other replicas. Whether those remote
changes are reported in order becomes irrelevant.

>
> That is completely independent of the issue of atomicity.
>
> [RM]
> 2. The whole "weeding and summarizing" discussion left me
> confused and therefore worried.  There seems to be an
> unstated assumption that all replicas are "well-connected".
> My concern is that if one or more replicas are
> "sporadically-connected" then the size of the trees could
> become an issue.  Again, I may be missing something in the
> draft, but if so, I'd claim it is buried.  Because the
> whole discussion left me confused, I think some clarification
> in the form of an example would help.
>
> Ryan
>
> [AL] If the WG accepts the draft as being on its agenda for
> discussion I
> think a lot more work will be needed on it, including
> detailed examples (and
> diagrams) for the weeding and summarizing process as you suggest.

I too have some concerns about the "weeding and summarizing" stuff.
In particular it is not possible to guarantee that all replicas will
converge toward the same state. A version of an entry cannot be made
"Durable" until after some unspecified delay to see if any higher version
shows up. For an entry version to become the definitive (durable) version
depends not only on earlier events but also on events that are yet to occur!
If the waiting time is chosen badly the replicas will quickly diverge.
No matter what delay is chosen there is always a chance that a higher
version will pop up immediately afterward.

Also, I can also construct examples where two replicas endlessly flip
back and forth between two competing strands with no entry versions ever
becoming durable.

> A simpler approach to conflict resolution would just resolve
> each conflict
> immediately, at each DUA that encounters a conflicting change, by
> suppressing the change with the lower version number etc and
> propagating
> only 1 survivor (which may itself be suppressed at another
> DUA with the
> survivor propagated from that conflict in turn reaching the
> previous DUA and
> suppressing the previous survivor there).

I don't think this is enough to solve the problems mentioned above.

> This is more or
> less what AD does
> (for attributes).
> URP makes the fatal mistake of splitting
> the operations
> into primitives so as to effectively merge conflicts instead
> of resolve
> them, and adds the serious mistake of giving higher priority
> to timestamps
> than to version numbers.

The LDUP CSNs can be used either way. URP doesn't care, it just sees
a monotonically increasing value.

> This means the "survivor" at each
> DUA separately is
> just the latest, with no actual conflict resolution at all.
>
> The tree is especially important for "sporadically-connected"
> replicas since
> they are the most likely to generate conflicts. In most cases
> the tree would
> be trivial or very small, but if it is not recognized as a tree, that
> "simplification" makes everything else become incredibly complicated.
>
> I don't see a large tree as an implementation problem in itself - the
> overhead is only about 12 bytes of RAM per conflicting change.

You would also need to store enough information to reconstruct the
previous versions of an entry on each strand. To change the current
entry version from one strand to another requires unwinding index changes,
etc, back to the common branching point and then applying the saved
change requests on the new strand. URP never has to go backwards.

[snip]

> The conflicts form a tree in reality, whether it is
> recognized or not. If
> that tree becomes large then URP would produce a high rate of
> "Extraordinary
> States" and "Transient Extraordinary States" while MDCR would
> produce a high
> rate of revocation notices because it has at least recognized
> the reality.
> Neither is appropriate for directory applications. (In my view even an
> extremely low rate of "Extraordinary States" without any
> means except manual
> administration to recover from them is completely unacceptable).

I don't expect many LDAP client application developers will want to
deal with the complexities of out-of-band revocation notices, or deal
with restoring the application context of the original change so it
can be repeated. The enterprising ones will probably just send a series
of spurious changes to an entry after each real change just to "up" the
version number and so increase the chance of the real change sticking.

In effect, what we have done with URP is hardwire a reasonable conflict
resolution mechanism (best effort merge) that is (we think) good enough
for most uses of the directory. But that doesn't leave the remaining uses
out in the cold. Alison and I have previously mentioned in passing that
we have an idea for providing strong consistency and transactions with URP.
This is a good time to sketch out what we mean.

This is the sequence of events for a user update request requiring or
requesting strong consistency:

The receiving replica (let's call it the primary) opens a session with
each of the other replicas and sends a request for all update primitives
up to and including the latest change to the target entry. This is just a
variation on a regular LDUP session. However the other replicas will also
lock the target entry to prevent any local changes to it. Some sort of
deadlock detection will be necessary.

The primary replica applies all the primitives it has received and then
attempts the user request. If it fails with an error the sessions with
the other replicas are closed and they drop the locks on the target entry.
If the request succeeds the primary server sends
the primitives up to and including the ones generated from the current
user request, then closes the sessions causing the locks to be
released. If the primary doesn't send the latest primitives the other
replicas will just import them the next time they action a strong
consistency update on the same entry. Two-phase commit isn't required.

Handling transactions is straightforward. The initial request from the
primary replica specifies a range of affected entries (maybe with
something like a search filter) instead of just the one target entry.
The primary is also allowed to send multiple requests within the same
session since it won't generally know all the entries affected by
a transaction at the beginning.

The above scheme requires all updatable replicas to be available to perform
a strong consistency update but there is a more general scheme that
allows some of the replicas to be unavailable. If there are N replicas
then it is only necessary to contact M (M > N/2) of them to make an update
provided N-M+1 of them are contacted before evaluating any strong
consistency query. The original description was the special case of M = N.

Regards,
Steven