[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: 3. "Change Reports - ProtocolOps or Primitives"
Replication out of sequence does not require a restart, as
long as changes are presented in order per replica. I would
highly recommend text stating that changes from each replica
MUST be transmitted in strictly increasing CSN order. My
current implementation stores and transmits changes from each
replica separately, in CSN order. This makes implementation a
lot easier for everyone. I don't see any problem with
Simply requiring strict ordering of the sequence of changes
from each *originator* replica would, as you say, be sufficient to avoid
the wasted bandwidth from restarts problem.
I'm not sure whether or not it would be sufficient for
reducing the bandwidth spikes on full resynchronization problem.
Either way, there are some major performance benefits to ALSO
adding a local record/message sequence number (independent of originator
sequence number except when the originator is the local replica,
in which case it is identical), and maintaining *that* strict order.
This does maintain the order from each originator but ALSO allows the
exchange of a single "HighwaterMark" between replication neighbours, as
used in the Coda/Active Directory protocols and proposed for report
propagation in MDCR.
The HighwaterMark has no global significance and is not needed
either to avoid restarts or for the weeding/summarizing or purging
mechanism. However it is useful between neighbours to enhance
performance, and I suspect also to avoid a full resynchronization
The specifics are implementation dependent. As I am not an
implementor and not familiar with the full range of implementations,
I am less clear on this than as regards standards and protocol
algorithms. However here's an attempt to discuss some implementation
issues as I understand them.
There are currently two major types of implementations:
1) Log based, in which logs of local changes and updates from
neighbour supplier replicas are appended sequentially and used to
transmit them to neighbour consumers.
2) Entry based, in which local changes and updates from neighbour
suppliers are stored with the corresponding entries. A scan of the
entries is then used to select reports for replication to consumers
for each replication session.
I have the impression that there may be some sort of informal "design
constraint", not explicitly mentioned in the requirements draft,
that may have resulted in omitting ordering from the specification
so as not to be any simpler to implement for log based implementations
than for entry based ones.
I am inclined to agree with you that there should be no problem for either
type of implementation in requiring ordering. It is natural for log based
implementations and just requires an index for entry based implementations.
Active Directory is an example of an entry based implementation with such
an index and seems to perform at least as well as Novell's. BTW Microsoft
has published sponsored test results showing that it works several times
faster. If Novell has published a reply or alternative test results, could
someone please post a link (or better still, non-sponsored benchmarks
independent of either). Unless that is done, I would assume that is
conclusive evidence that there is no performance advantage in not ordering.
There is also apparantly some confusion between "entry based" and "state
based". A log based implementation would naturally store operations rather
than states for local changes, and existing "Changelog" implementations
derived from UMich LDAP servers do the same for replication.
An "entry based" implementation could also be "state based" rather than
operation based. But anything supporting signed operations, or maintaining
atomicity as proposed in MDCR, would have to store operations. I see no
reason why the primary key for that could not be entryID in an entry
For EITHER log based or entry based implementations, it seems to me natural
to use a single key for maintaining the sorted order for receiving changes
from suppliers and local DUAs and passing them on to consumers.
That is what Active Directory does, using a local record/message sequence
number as proposed in MDCR. It is state based (ie also entry based).
An alternative is to use a global timestamp and not have a secondary key
at all. As well as causing the problems we are agreed about, that strikes
me as substantially less efficient internally. Unless there is some sort
of index you would have to scan EVERY entry to check whether it is above
the UpdatedMark of each consumer for each replication session, despite the
fact that replication sessions are frequent and most entries are unchanged.
By using a separate record/report number as either the log primary key for
a log based implementation, or a secondary key for an entry based
you can then use a HighWater mark for each neighbour consumer to drastically
limit either the entries scanned or the starting point of a log scan.
Then FROM that starting point, you transmit only the change reports that are
above the UpdatedMark or vector for the particular consumer. This has no
impact on bandwidth and restarts, since exactly the same reports, based on
the consumer's UpdatedMarks or vector would be transmitted as without the
HighwaterMark. But it is much less resource intensive for replication
Separating this local record/report number from the originator report
number is necessary for this sequencing. It also avoids the absurdity of
relying on timestamps for originator sequencing despite clocks not even
being monotonic. As mentioned in MDCR, it can also be used with a state
based (non-atomic) protocol design to set an entry record sequence number
as the maximum of the Attribute (Active Directory) and/or Attribute value
(URP) sequence numbers of the entry, to further speed up processing.
You *seem* to be indicating that you have a log based implementation, in
which change reports are stored in separate logs for each originating
and that it would therefore be convenient to transmit these logs in the
same sequence to consumers. I can understand why separate logs might be
used for the local changes and those from each *supplier*, but I do not
understand the point of separate logs for each *originator*, since each
replication session from a neighbour supplier would be a mixture of reports
several originators, who are not necessarily neighbours.
With multiple logs, by simply adding a local record sequence number to each
log record (sequential across logs as well as within them), you
would be able to transmit the changes in the same sequence as received,
by simply merging the logs as you transmit, without an extra sort, just
as Active Directory is able to transmit them in that order despite being
entry based (and in fact state based).
>I agree that the number of bytes is a non-issue. As a
>allowing replication out of sequence requires a restart from
>to ensure update vectors are synchronized whenever a
>is interrupted. That is most likely to happen on connection
>bandwidth is most costly.
>I mentioned in email to Ed and will now repeat here, that
this is especially
>a problem for "sometimes connected" or dialup DSAs that may
>a "supplier push" session to a Hub from another dialup DSA in
>a session for the same replicated area, given that only one
>can be handled at a time by the Hub.
>Likewise, use of timestamps requires a full reload rather
>to get back into sync after failures, which causes bandwidth
>Both of these "features" of the current architecture draft
are not inherent
>in URP but easily fixed. So far nothing has been done to fix