[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: 10. "Report propagation sequence"
---- Original message ----
>Date: Tue, 15 Aug 2000 15:30:16 +1000
>From: "Steven Legg" <steven.legg@xxxxxxxxxxxxx>
>Subject: RE: 10. "Report propagation sequence"
>To: "'Zachary Amsden'" <zach@xxxxxxxxxxxxx>,
>> What really worried me about the current phrasing of the draft
>> was the fact that no ordering requirements were made at all
>> within a transmission. This causes severe problems when a
>> replication update is interrupted.
>> I'm not so much worried about bandwidth, but the following
>> Replica A sends a very long partial update to replica B.
>> Somewhere in the middle of the transfer, the connection is
>> terminated. Since we can get CSNs out of order within a
>> replica ID, we have three options:
>> 1) Save all changes until the transfer is complete, then
>> commit them and send an end replication response. This
>> still worries me because committing the changes may
>> take a very long time, during which our peer may decide
>> we are dead, and drop the connection.
>The ReplicationUpdate operations aren't individually acknowledged
>(as far as I can see) so the supplier isn't going to see much difference
>whether the consumer sucks in all changes and then applies them, or
>applies them one at a time. In fact, applying all the changes in
>one database transaction will probably be faster overall. Suppliers
>really have to assume that consumer processing will take significantly
>longer than supplier processing.
>Even if the connection is dropped the consumer can continue to process
>the changes and revise the Update Vector appropriately if the
>EndReplicationRequest has been received. The supplier will find out
>the new vector on the next replication session and will avoid resending
>any already applied changes.
Agree, but there are other problems with approach 1, which I discuss below.
>> 2) Commit changes as we get them. Very problematic, since
>> there is nothing prohibiting LDAP updates to our local
>> replica while we are in a replication session. So when
>> the connection dies, we have no idea what the current
>> update vector for our replica is, and we can't easily
>> back out changes because we may have received updates.
>If changes are applied in delivery order then the update vector must
>not be revised with respect to the received changes until the end of the
>replication session. However, the update vector would still be revised
>for each locally executed LDAP update. If the connection fails then
>the update vector is what it was at the start of the session except
>for the server's own record in the vector, which may have higher CSNs.
If changes are applied in delivery order, then there is no issue. The
update vector on the supplier is of no concern, since we send our vector at
the beginning of a replication session. If, however, changes have no
particular ordering, then 2 is not an option.
>> 3) Lock down our local replica from updates during the
>> replication process. For some LDAP applications, this
>> will not be an option. We may not know which replica
>> to send update referrals too, or our clients may not
>> be able to chase referrals for any number of reasons.
>> Not only this, but if the connection does die, we can't
>> allow updates until replication is re-established with
>> the same supplier, or we revert all the changes and
>> restore to a known state.
>> Since there isn't actually enough information in a change
>> record to undo the change, any server crash during this
>> process is rather disastrous. This means we need to store a
>> reverse operation for each change as well.
>Isn't this the same problem as a crash during an LDAP client update ?
>There is a set of changes to be applied atomically to the local
>database. Either all the changes occur or none do. Do you have
>a problem scaling this up to encompass a whole replication session ?
>Or is this option describing a solution that doesn't have
>atomic commits to the underlying database. That's scary to me even
Well yes, there is a problem scaling that up to encompass an entire (total)
replication session. That isn't the real issue, and that is fixable. What
isn't fixable is that during a total update, none of the data is available
for read during replication. We would like the database to be live (albeit
read-only) during a total update. Perhaps allowing read-write access
during a total update is possible, but that would be a future step.
We've hashed over the ordering issue, and I don't think there are any
problems transmitting changes in delivery order for any server
implementation. Not only that, but using ordering gives you a number of
other properites (read access of changes during update, parents created
before children, easy recovery from aborted updates, better DSA crash
recovery), which make any implementation either easier or more robust.
I'd like to know if we have a concensus that changes, whatever format they
may be transmitted in, should be ordered by delivery order.