[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
10. "Report propagation sequence"
It is a log based implementation, where changes are stored in
separate logs for each *originator*.
*Originator* being an updatable replica with a known replica
Thus one supplier may transmit data that gets recorded in
several logs, demultiplexed based on the replicaID. The logs
are naturally in strictly increasing CSN order because of the
This ordering is sufficient to establish a known update vector
in the case of an interrupted transmission.
Sorry about the terminology confusion.
This discussion started in: 3. "Change Reports - ProtocolOps or Primitives".
It is really a separate item, so I am changing the thread to:
10. "Report propagation sequence"
and appending the whole of that section of the original to this, so that
when Steve, Ed or others respond, they can also do so here.
There was no terminology confusion as your explanation corresponds to
what I thought your implementation was doing. What I did not understand
in the paragraph you responded to, was *why* a log based
implementation would choose to demultiplex the updates it
consumes into separate logs for each originator replicaId.
That is clearly not necessary in order to transmit updates in the
same sequence that they are received, for each originator replicaId.
You have still not explained what advantages it has.
The paragraph you were responding to was included in a longer
message repeated below.
The first sentence agreed with you that your proposal to maintain
the sequence only within each originator replicaID is sufficient to
avoid any need for restarts (ie establishes a known update
vector in the case of interrupted transmission).
That of course assumes that the protocol design is changed
to require *all* replicas to propagate changes from each
originator in the same sequence that they are received, as
you proposed. We are both agreed that such a change is necessary to
avoid restarts, although so far nobody else has commented. (Steve
has indicated that he is drafting some change to URP that may be
However the remainder argued that as well as that design
change, it is desirable to further require all replicas to
propagate each update it receives (or local change), in the
same sequence as they are received, regardless of originator.
That necessarily maintains the sequencing within each originator
but adds a stronger requirement NOT to demultiplex the updates
consumed and then transmit the separate logs sequentially (or at
least to merge them on transmission as though the implementation
had never done so).
We seem to be agreed that the sequencing in the current architecture
should be changed, but you have proposed a different alternative
sequencing to MDCR, without responding to my explanation below,
of the advantages of the HighwaterMark enabled by the stronger
sequencing in MDCR:
At least one of us is confused now, possibly both, as you
have responded only to the one part of my response that we seemed
to be clear about and in agreement about.
BTW, the terminology I am using is defined in MDCR, which is an alternative
protocol design, not an implementation. I have now added terms to
between "entry based" and "state based" (ie non-atomic) implementations as
we are discussing implementation performance issues. My definitions have not
been adopted by the WG, but I use them to try to avoid the confusion between
originator changes and propagation updates inherent in the current
Complete relevant sections of original messages follow:
I regard efficiency on the wire as largely a non-issue for replication.
My experience with X.525/DISP is that supplier DSAs can pump
changes through to consumers at least an order of magnitude faster
than the consumers can process them. A super efficient LDUP protocol
won't make much overall difference.
I agree that the number of bytes is a non-issue. As a separate matter,
allowing replication out of sequence requires a restart from the beginning
to ensure update vectors are synchronized whenever a replication session
is interrupted. That is most likely to happen on connection oriented links
where bandwidth is most costly.
I mentioned in email to Ed and will now repeat here, that this is especially
a problem for "sometimes connected" or dialup DSAs that may need to
interrupt a "supplier push" session to a Hub from another dialup DSA in
order to obtain a session for the same replicated area, given that only
one consumer session can be handled at a time by the Hub.
Likewise, use of timestamps requires a full reload rather than incremental
to get back into sync after failures, which causes bandwidth spikes.
Both of these "features" of the current architecture draft are not inherent
in URP but easily fixed. So far nothing has been done to fix them.
Unlike the number of bytes in the syntax, they would have a significant
impact on bandwidth requirements.
Replication out of sequence does not require a restart, as
long as changes are presented in order per replica. I would
highly recommend text stating that changes from each replica
MUST be transmitted in strictly increasing CSN order. My
current implementation stores and transmits changes from each
replica separately, in CSN order. This makes implementation a
lot easier for everyone. I don't see any problem with
Simply requiring strict ordering of the sequence of changes
from each *originator* replica would, as you say, be sufficient to avoid
the wasted bandwidth from restarts problem.
I'm not sure whether or not it would be sufficient for
reducing the bandwidth spikes on full resynchronization problem.
Either way, there are some major performance benefits to ALSO
adding a local record/message sequence number (independent of originator
sequence number except when the originator is the local replica,
in which case it is identical), and maintaining *that* strict order.
This does maintain the order from each originator but ALSO allows the
exchange of a single "HighwaterMark" between replication neighbours, as
used in the Coda/Active Directory protocols and proposed for report
propagation in MDCR.
The HighwaterMark has no global significance and is not needed
either to avoid restarts or for the weeding/summarizing or purging
mechanism. However it is useful between neighbours to enhance
performance, and I suspect also to avoid a full resynchronization
The specifics are implementation dependent. As I am not an
implementor and not familiar with the full range of implementations,
I am less clear on this than as regards standards and protocol
algorithms. However here's an attempt to discuss some implementation
issues as I understand them.
There are currently two major types of implementations:
1) Log based, in which logs of local changes and updates from
neighbour supplier replicas are appended sequentially and used to
transmit them to neighbour consumers.
2) Entry based, in which local changes and updates from neighbour
suppliers are stored with the corresponding entries. A scan of the
entries is then used to select reports for replication to consumers
for each replication session.
I have the impression that there may be some sort of informal "design
constraint", not explicitly mentioned in the requirements draft,
that may have resulted in omitting ordering from the specification
so as not to be any simpler to implement for log based implementations
than for entry based ones.
I am inclined to agree with you that there should be no problem for either
type of implementation in requiring ordering. It is natural for log based
implementations and just requires an index for entry based implementations.
Active Directory is an example of an entry based implementation with such
an index and seems to perform at least as well as Novell's. BTW Microsoft
has published sponsored test results showing that it works several times
faster. If Novell has published a reply or alternative test results, could
someone please post a link (or better still, non-sponsored benchmarks
independent of either). Unless that is done, I would assume that is
conclusive evidence that there is no performance advantage in not ordering.
There is also apparantly some confusion between "entry based" and "state
based". A log based implementation would naturally store operations rather
than states for local changes, and existing "Changelog" implementations
derived from UMich LDAP servers do the same for replication.
An "entry based" implementation could also be "state based" rather than
operation based. But anything supporting signed operations, or maintaining
atomicity as proposed in MDCR, would have to store operations. I see no
reason why the primary key for that could not be entryID in an entry
For EITHER log based or entry based implementations, it seems to me natural
to use a single key for maintaining the sorted order for receiving changes
from suppliers and local DUAs and passing them on to consumers.
That is what Active Directory does, using a local record/message sequence
number as proposed in MDCR. It is state based (ie also entry based).
An alternative is to use a global timestamp and not have a secondary key
at all. As well as causing the problems we are agreed about, that strikes
me as substantially less efficient internally. Unless there is some sort
of index you would have to scan EVERY entry to check whether it is above
the UpdatedMark of each consumer for each replication session, despite the
fact that replication sessions are frequent and most entries are unchanged.
By using a separate record/report number as either the log primary key for
a log based implementation, or a secondary key for an entry based
implementation, you can then use a HighWater mark for each neighbour
consumer to drastically limit either the entries scanned or the starting
point of a log scan.
Then FROM that starting point, you transmit only the change reports that are
above the UpdatedMark or vector for the particular consumer. This has no
impact on bandwidth and restarts, since exactly the same reports, based on
the consumer's UpdatedMarks or vector would be transmitted as without the
HighwaterMark. But it is much less resource intensive for replication
Separating this local record/report number from the originator report
number is necessary for this sequencing. It also avoids the absurdity of
relying on timestamps for originator sequencing despite clocks not even
being monotonic. As mentioned in MDCR, it can also be used with a state
based (non-atomic) protocol design to set an entry record sequence number
as the maximum of the Attribute (Active Directory) and/or Attribute value
(URP) sequence numbers of the entry, to further speed up processing.
You *seem* to be indicating that you have a log based implementation, in
which change reports are stored in separate logs for each originating
and that it would therefore be convenient to transmit these logs in the
same sequence to consumers. I can understand why separate logs might be
used for the local changes and those from each *supplier*, but I do not
understand the point of separate logs for each *originator*, since each
replication session from a neighbour supplier would be a mixture of reports
several originators, who are not necessarily neighbours.
With multiple logs, by simply adding a local record sequence number to each
log record (sequential across logs as well as within them), you
would be able to transmit the changes in the same sequence as received,
by simply merging the logs as you transmit, without an extra sort, just
as Active Directory is able to transmit them in that order despite being
entry based (and in fact state based).