[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: various comments (was Re: What's going on? - Status of Requirements...)



Steve,

Thanks for the detailed and helpful response a while back.

http://www.imc.org/ietf-ldup/mail-archive/msg00576.html

Looks to me like we've moved into concrete discussion of various issues
between URP and MDCR and their relation to requirements (thanks also to Ryan
for getting this going). Sorry about the long delay responding.

I propose we should discuss each of those issues under separate headings
rather than continuing within a thread that started from a request for
status reports.

I'm planning to chop up your message, re-arranging it within the categories
I perceive and want to respond to, each with a link here from which people
can get the original context. I'm starting now to ensure anyone interested
has time to think about the issues before the Pittsburgh meeting, which I
will be unable to attend. Finishing may still be delayed for a week or so
due to other commitments.

Meanwhile, here's a summary of the issues as I see them, and their relation
to the original request for status reports.

1. "Atomicity and related concepts". Disagreement about definition,
importance and relation to other concepts and requirements. Should be
clarified and resolved by definitions, explanations and requirements in
requirements document before final call.

2. "ModifiersName and other Operational Attributes". Disagreement about
definition, semantics, and implications of URP and MDCR proposals for
handling them. Definition and semantics should be clarified in requirements
document before final call. URP and MDCR handling should be clarified in
further discussion of URP and MDCR. Unclear whether there is a disagreement
about a requirement to handle ModifiersName "correctly", because of lack of
agreed definition of what would be "correct".

3. "Change Reports - ProtocolOps or Primitives". Reference by Steve to
non-LDAP and non-X500 replicating servers may involve clarification of
requirements. Other issues can be clarified in discussion of URP and MDCR
approaches.

4. "Eventual Convergence - Version numbers or timestamps". Agreement that
URP could operate with priority for either version numbers or timestamps. I
maintain that version numbers should have priority, for reasons fully
explained in the Coda file system research. If URP remains the only protocol
under consideration by the WG it can and should be changed to do so. Design
issue, not requirements (though relevant to any requirement for eventual
convergence - see below).

Implicit agreement that in the absence of further changes, all replicas
should converge to the same state and any protocol that does not guarantee
this is unacceptable. Disagreement as to whether each of MDCR and URP meet
that requirement. Steve maintains that MDCR cannot meet that requirement. I
maintain that MDCR can and that URP as currently drafted does not, but that
URP is capable of doing so, provided that version numbers have priority over
timestamps. Although there seems to be no disagreement about requirements on
this between Steve and myself, I cannot resist  pointing out that the
current requirements draft actually "requires" a "flexible" ability to
accommodate both models that guarantee convergence and models that will
diverge.

5. "Oscillation". Implicit agreement that no protocol should oscillate. No
need to state in requirements as both obvious and covered by 4 above. Steve
says he has an example showing that MDCR oscillates. I say that neither MDCR
nor URP oscillates. Unfortunately the margin of this email is too small for
the proof ;-) Fortunately it is short enough to present in another email
rather than leaving the world waiting breathlessly for hundreds of years...
or subjecting it to eyeglazing discussion of examples.

6. "Implementation and Performance Issues". Disagreement as to whether there
is any significant difference in implementation difficulty or performance.

7. "Revocation notices". Implicit agreement that MDCR could add them and
that URP makes no provision for them because it does not revoke anything.
Disagreement as to whether they are useful.

8. "Strong consistency and transactions". Steve has sketched an idea that
Alison and Steve have for adding these to URP. While relevant to the WG, I
don't think they are intended to meet any objections to adoption of URP for
multi-master replication as they do not satisfy the definition of, nor the
requirements for, multi-master replication. I believe this does shed further
light on the requirements draft problem, as by failing to give any
explanation whatever of user needs for multi-master replication, that draft
invites confusion.

Original message below. I am adding comments re item 8 at the end as further
discussion of it has already occurred in this thread. Comments on other 7
will follow under the 7 headings listed.

*******************************
[snip]

> [RM]
> On the subject of the requirements / URP draft:
>
> 1. I still believe the requirements draft should be more
> explicit about atomicity of operations being maintained
> across replication.  In my various re-readings of this
> draft, I have at times found justification for both
> sides (maintain and do not maintain) and I still think
> that maintaining atomicity is necessary.
>
> 2. The URP draft should be explicit in how it maintains
> atomicity of operations across replication.  I'm pretty
> sure from my last perusal it doesn't now.
>
> [AL]
> Understood.
> URP certainly does not maintain atomicity and is explicit
> about that. This
> problem is not in the URP draft, but in the requirements
> draft. If there was
> a requirement to maintain atomicity there would be a
> completely different
> URP (not necessarily MDCR).

Atomicity is not the right concept to be arguing about in a multimaster
replication environment. Consider this example.

At server S1 at time t1 a user modify request on an entry adds attribute
A1 and replaces the existing values of attribute A2. At server S2 at
time t2 (t2 > t1) a user modify request replaces attribute A2. Some time
later the modify from server S1 arrives at S2. If S2 performs this
operation atomically it will replace the newer (time t2) value(s) of
attribute A2 with the older values (time t1). This is clearly the wrong
thing to do. If S2 adds the new attribute A1 but ignores the replacement
of A2 (as URP would do) then it produces the same outcome as the
serial execution of the two modify requests, though the action doesn't
fit the usual definition of "atomic".

If we are going to discuss atomicity in replication then we need a more
meaningful definition. URP makes all the replicas produce an outcome
that is equivalent to the serial atomic execution of all the updates.
That is as atomic as it needs to be.

The real question revolves around the preconditions of a user update
request. If we look at the equivalent serial processing of a collection
of updates performed at two or more multimaster replicas then some of those
updates will be "executed" against a different database state than actually
existed at the time they were really performed by one of the replicas.
The URP philosophy is that most of the time for most applications it doesn't
really matter. The MDCR philosophy is that it is better to completely
disregard the user's update, after the fact, just in case the state of the
entry did matter.

[snip]

> [RM]
> 3. Once URP maintains change atomicity, the "modifiersName"
> issue in my mind goes away.  There may be others that
> still remain...
>
> [AL]
> Agreed. "modifiersName" is impossible without atomicity, and
> easily achieved
> with it (as are many other things).

You seem to be assuming that the DSA maintained operational attributes
are being independently maintained by each replica. The intent in URP is
that the replica executing the user update request will modify the
operational attributes, like modifiersName and modifyTimestamp, as
required, and that these changes will also be reflected in the primitives
sent to the other replicas. Those other replicas will keep the values
of these operational attributes with the highest CSNs, which will
correspond to the latest change to the user attributes. Exactly the
same outcome as a serial execution of the updates in CSN order.

[snip]

> As regards efficient propagation of the tree via the report
> propagation
> protocol I believe the encoding on the wire is as efficient
> and as simple as
> possible by just conveying the actual LDAP protocolOps for
> the changes. The
> only redundancy is for the relatively small proportion of changes that
> affect the Directory Information Tree (DIT), ie add, del and modifyDN,
> rather than just the Directory Information Base (DIT), ie modify (for
> changes to non-distinguished attributes and values).

The replicating servers aren't necessarily LDAP or X.500 DSAs, so the update
protocol operations aren't necessarily just LDAP, DAP or DSP operations.
Also, these protocols are still subject to change so additional operations
may be defined in the future, or existing ones extended. LDAP also has
a means for vendors to define proprietary operations. We can't expect LDUP
implementors to cover all the possibilities.

The original request also doesn't carry the changes to operational
attributes or other vendor specific DSA invoked changes such as might
occur to maintain referential integrity of DNs.

Breaking all update requests into replication primitives gets around these
problems.

[snip]

> Incidentally the MDCR draft also proposes adoption of the Coda report
> propagation protocols  adopted by AD. I believe that would be
> a considerable
> improvement without requiring any substantial change to either the
> requirements or architecture and especially beneficial to the
> existing URP
> design as there is a lot of avoidable complication there
> simply because
> reports are propagated out of sequence by each replica.

Propagating the changes strictly in CSN order wouldn't make much difference.
The "complexity" arises from the requirement to process local changes out
of order with updates reported by other replicas. Whether those remote
changes are reported in order becomes irrelevant.

>
> That is completely independent of the issue of atomicity.
>
> [RM]
> 2. The whole "weeding and summarizing" discussion left me
> confused and therefore worried.  There seems to be an
> unstated assumption that all replicas are "well-connected".
> My concern is that if one or more replicas are
> "sporadically-connected" then the size of the trees could
> become an issue.  Again, I may be missing something in the
> draft, but if so, I'd claim it is buried.  Because the
> whole discussion left me confused, I think some clarification
> in the form of an example would help.
>
> Ryan
>
> [AL] If the WG accepts the draft as being on its agenda for
> discussion I
> think a lot more work will be needed on it, including
> detailed examples (and
> diagrams) for the weeding and summarizing process as you suggest.

I too have some concerns about the "weeding and summarizing" stuff.
In particular it is not possible to guarantee that all replicas will
converge toward the same state. A version of an entry cannot be made
"Durable" until after some unspecified delay to see if any higher version
shows up. For an entry version to become the definitive (durable) version
depends not only on earlier events but also on events that are yet to occur!
If the waiting time is chosen badly the replicas will quickly diverge.
No matter what delay is chosen there is always a chance that a higher
version will pop up immediately afterward.

Also, I can also construct examples where two replicas endlessly flip
back and forth between two competing strands with no entry versions ever
becoming durable.

> A simpler approach to conflict resolution would just resolve
> each conflict
> immediately, at each DUA that encounters a conflicting change, by
> suppressing the change with the lower version number etc and
> propagating
> only 1 survivor (which may itself be suppressed at another
> DUA with the
> survivor propagated from that conflict in turn reaching the
> previous DUA and
> suppressing the previous survivor there).

I don't think this is enough to solve the problems mentioned above.

> This is more or
> less what AD does
> (for attributes).
> URP makes the fatal mistake of splitting
> the operations
> into primitives so as to effectively merge conflicts instead
> of resolve
> them, and adds the serious mistake of giving higher priority
> to timestamps
> than to version numbers.

The LDUP CSNs can be used either way. URP doesn't care, it just sees
a monotonically increasing value.

> This means the "survivor" at each
> DUA separately is
> just the latest, with no actual conflict resolution at all.
>
> The tree is especially important for "sporadically-connected"
> replicas since
> they are the most likely to generate conflicts. In most cases
> the tree would
> be trivial or very small, but if it is not recognized as a tree, that
> "simplification" makes everything else become incredibly complicated.
>
> I don't see a large tree as an implementation problem in itself - the
> overhead is only about 12 bytes of RAM per conflicting change.

You would also need to store enough information to reconstruct the
previous versions of an entry on each strand. To change the current
entry version from one strand to another requires unwinding index changes,
etc, back to the common branching point and then applying the saved
change requests on the new strand. URP never has to go backwards.

[snip]

> The conflicts form a tree in reality, whether it is
> recognized or not. If
> that tree becomes large then URP would produce a high rate of
> "Extraordinary
> States" and "Transient Extraordinary States" while MDCR would
> produce a high
> rate of revocation notices because it has at least recognized
> the reality.
> Neither is appropriate for directory applications. (In my view even an
> extremely low rate of "Extraordinary States" without any
> means except manual
> administration to recover from them is completely unacceptable).

I don't expect many LDAP client application developers will want to
deal with the complexities of out-of-band revocation notices, or deal
with restoring the application context of the original change so it
can be repeated. The enterprising ones will probably just send a series
of spurious changes to an entry after each real change just to "up" the
version number and so increase the chance of the real change sticking.

In effect, what we have done with URP is hardwire a reasonable conflict
resolution mechanism (best effort merge) that is (we think) good enough
for most uses of the directory. But that doesn't leave the remaining uses
out in the cold. Alison and I have previously mentioned in passing that
we have an idea for providing strong consistency and transactions with URP.
This is a good time to sketch out what we mean.

This is the sequence of events for a user update request requiring or
requesting strong consistency:

The receiving replica (let's call it the primary) opens a session with
each of the other replicas and sends a request for all update primitives
up to and including the latest change to the target entry. This is just a
variation on a regular LDUP session. However the other replicas will also
lock the target entry to prevent any local changes to it. Some sort of
deadlock detection will be necessary.

The primary replica applies all the primitives it has received and then
attempts the user request. If it fails with an error the sessions with
the other replicas are closed and they drop the locks on the target entry.
If the request succeeds the primary server sends
the primitives up to and including the ones generated from the current
user request, then closes the sessions causing the locks to be
released. If the primary doesn't send the latest primitives the other
replicas will just import them the next time they action a strong
consistency update on the same entry. Two-phase commit isn't required.

Handling transactions is straightforward. The initial request from the
primary replica specifies a range of affected entries (maybe with
something like a search filter) instead of just the one target entry.
The primary is also allowed to send multiple requests within the same
session since it won't generally know all the entries affected by
a transaction at the beginning.

The above scheme requires all updatable replicas to be available to perform
a strong consistency update but there is a more general scheme that
allows some of the replicas to be unavailable. If there are N replicas
then it is only necessary to contact M (M > N/2) of them to make an update
provided N-M+1 of them are contacted before evaluating any strong
consistency query. The original description was the special case of M = N.

Regards,
Steven

********************************
[AL] (Commenting only on the last item). If DSAs have to contact any other
replica, whether all, a majority or just a few, before applying a change,
they are not providing the benefits of multi-master replication. This is
clearly excluded by the accurate definition of multi-master replication in
the WG charter. The requirements draft should clearly explain that the
reasons for needing multi-master replication standards is to meet user needs
for changes to be available locally to other users of a local DSA, whether
or not that DSA is able to contact other DSAs, and without suffering the
performance and scalability problems of attempting to contact any of them
before applying the change and making it locally available.

Instead, the only scenario in the requirements draft which does involve that
user need, is not identified as being relevant to multi-master replication,
while every single scenario identified as explaining the need for
multi-master replication actually involves no need for multi-master
replication at all - either because it is to do with meta directories rather
than replication, or because it could be achieved just as well by single
master replication.

Once you give up the scalability and performance of local availability of
changes made possible by multi-master replication, I see no advantage in
this proposal over the use of failover single master replication as proposed
earlier by Alan Lloyd. The latter is already covered by standards for single
master replication and by the fact that a DSA need not hold any naming
context or shadow any naming context, but can just be equivalent to a DUA
that speaks DAP. Both ideas are simply irrelevant to the requirements we
ought to be trying to meet. This would be obvious if we actually had a
requirements draft that actually talked about those needs.

Incidentally Coda does provide very closely related ideas for cacheing file
server clients to contact a subset of multi-master replicating file servers
in an enhancement of the Andrew File System for "sometimes disconnected"
use. As mentioned in MDCR, it is worth studying thoroughly for ideas
relevant to LDAP replication and I have attempted to adapt some ideas from
it in the MDCR report propagation protocol, also used by Active Directory.
Needless to say, Coda does not attempt to merge changes made to different
aspects of a file by different clients eg by applying a name change made at
one client to a contents change made concurrently at another. Unfortunately
when the servers are partitioned, Coda has no alternative but to resort to
manual fixups for conflicts, in much the same way that any LDUP protocol
will have to for conflicting create, modifyDN and delete operations.
Nevertheless there is really extensive research there on precisely HOW to
deal with contacting less than all the servers at once. It does center
around revocation notices and priority for version numbers rather than
timestamps.