[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Supporting Partial Replication



Updated to add some thoughts on overlapping replication contexts...

John McMeeking


Before I go on vacation too...

We seem to be operating on different understandings of LDUP with respect to areas of replication, replication contexts, and update vectors.

You mention a "fourth option", being to maintain "an update vector per replication area in a replication context and use the cascade rule to maintain the update vectors." As I understand LDUP: an area of replication and a replication context are the same thing, with replication context being the current LDUP terminology. They identify an area of the DIT that is replicated -- a replication context has a single root entry and is bounded by subordinate replication contexts (ldup-model 3.5 - Terms and Definitions). LDUP already defines a separate update vector per replication context (per replica). On the surface at least, your "fourth option" appears to be LDUP as currently defined.

For a server to participate in a cycle (which ldup-replica-req now refers to as a "replica group" precisely because of the misunderstanding I had about your use of cyle), the server must hold a copy of the replication context. And just to make sure my assumptions about what this means are clear:
- a replica group is defined in the context of a replication context. It is the servers that hold instances of a particular area of replication. A server may be part of several replica-groups.
- replication agreements are defined in the context of a replication context.

In your example, with three servers and two replication contexts, agreements between all the servers implies that there are two sets of agreements between each server -- a set for each replication context. The existance of a replication agreement in one replication context does not imply a corresponding agreement in other replication contexts.

I'm going to go out on a limb, and guess that part of our misunderstandings has to do with "overlapping" replication contexts -- you mentioned replicating ACL via sparse or fractional replication, while having a full replica of some subtree). This may be required by ldup-replica-req (mentioned in terminology, but not in specific requirements), but is currently listed as a non-objective of ldup-model (section 3.3e). Would it be fair to state that you think ldup-model (and friends) needs to address overlapping replication contexts? My responses have been in the context of what ldup-model claims to support, and in that context I seem to be having a problem understanding your concerns and properly communicating my understanding.

After thinking about overlapping replication contexts a bit more, IF we are going to tackle it, I think we need to address the following:
1. How do we define the bounds of a replication context that is not bounded by nested replication contexts? Kurt Zielenga's ldap-subentry draft would be useful here (subtree specification).
2. If a client update falls within multiple replication contexts, how should LDUP behave? Let's start with replicating changes under all appropriate replication contexts, meaning that the same update will be sent multiple times under different replication sessions (they are idempotent, so this should be okay). This should keep update vectors in the correct state, as an update under one replication context may be replicated before earlier updates (by CSN) that fall within other overlapping replication contexts.
3. Do we allow multiple replication contexts with different bounds to have the same root? I'd like to withdraw the question, because I'm sure that once asked, the answer will be "YES!" This makes my head hurt more than I need just before Christmas, so I'll leave that for others to gnaw on.


John McMeeking

"Steven Legg" <steven.legg@xxxxxxxxxxxxx>

                  "Steven Legg" <steven.legg@xxxxxxxxxxxxx>

                  12/20/2001 11:51 PM
                  Please respond to steven.legg



To: John McMeeking/Rochester/IBM@IBMUS
cc: <ietf-ldup@xxxxxxx>
Subject: RE: Supporting Partial Replication



John,

John McMeeking wrote:
> See responses marked <JAM>

> John,
>
> John McMeeking wrote:
> > For either S1
> > or S2 to replicate U1 to S3, replication context R1 must be added to S3,
> > and the replication context properly initialized on S3 -- either via a
> full
> > update replication session, or via some other means (i.e. LDIF). At that
> > point, U1 (and the rest of the entries in R1) are present on S3 and
> > replication continues normally. Replication of R1 is independent of R2.
>
> You're alluding to the flip side of what I'm saying. If the current
> architecture can't support a replication topology where the servers
> in a cycle hold different replication areas in the same replication
> context then the choices are to not replicate, or to force all the
> servers in the cycle to have the same replication area(s). Too bad
> if I don't want S3 to see stuff in R1.
>
> <JAM>
> Are you talking about setting up something like this?

No. The choices are:

1) don't replicate, i.e. break the cycle by throwing out S3 (in my original
example),

2) force all servers in the cycle to have the same replication area,
i.e. S1 holds R1, S2 holds R1 and S3 holds R1, forget about R2,

3) change the LDUP architecture to support the original topology.


> Server S1 hold R1 and R2
> Server S2 holds R2
> Server S3 holds R1 and R2
> Set up replication agreements such that S1 supplies S2, S2 supplies S3
> and S3 supplies S1.
>
> As defined (and I think we agree this is the current behavior), LDUP
> allows this to be done only for R2. As S2 does not hold R1, you can
> not set up replication for R1 to/from S2. As I understand it, the
> agreements for R1 and R2 are completely independent. For example, if
> I add R2 to S2, and then set up the cycle described above, there would
> be at least 6 replication agreements S1->S2(R1), S1->S2(R2), S2->S3(R1),
> ... Going back to the scenario described above, under LDUP you would
> set up two independent cycles: S1->S2->S1 (for R1) and S1->S2->S3->S1
> (for R2).

They're not independent since S1 and S2 each have a single update vector
for both R1 and R2 in the current architecture. Events in one cycle
affect the other.

>
> I don't see a problem.
> </JAM>
>
> > If R2 is a sparse/fractional replica of R1, R2 would not be considered a
> > separate replication context. In this case, sparse/fractional
replication
> > is an attribute of the replicaSubentry for S3. If U1 falls within the
> > attributes and/or entries specified for S3, it will be replicated under
> > the replication agreements targeting S3 under R1, and the UV for S3
updated
> > accordingly.
> >
> > What happens when S3 is a fractional replica, and U1 does not contain
any
> > attributes replicated to S3? draft-ietf-ldup-model-06, section 8.2,
> > specifies "When fully populating or incrementally bringing up to date a
> > Fractional Replica each of the Replication Updates must only
> > contain updates to the attributes in the Fractional Entry
Specification."
> > This implies that S3 will never see U1, and thus not fully update its
> > update vector until such time as it receives an update originating at
the
> > same server.
>
> Do you agree that S1 will also never see U1 ?
> This breaks eventual convergence.
>
> <JAM>
>
> Okay, now I think I understand... Let me restate this scenario:
> S1 holds full replica of R1
> S2 hold full replica of R1
> S3 holds fractional replica of R1
> Replication agreements are defined such that S1 supplies S3, S3 supplies
S2,
> and S2 supplies S1.

I've assumed symmetry in the replication agreements for my original example,
so the topology is a undirected graph. S1 supplies S2, S1 supplies S3,
S2 supplies S1, S2 supplies S3, S3 supplies S1 and S3 supplies S2. The
subset of these agreements that are significant to the example are S2
supplies S1, S2 supplies S3 and S3 supplies S1. The other agreements are
invoked but end up sending nothing new.

>
> Under such an configuration, U1 is not seen by S3, as S1 doesn't replicate
> it to S2. Before proceeding, let me restate that there is a difference
> between holding a subtree of an area of replication and holding a
> fractional replica. As I understand it, holding a subtree implies the
> existance of another area of replication corresponding to that subtree
> -- as opposed to a sparse replica (not supported by the ldup model) which
> holds some entries in an area of replication.
>
> I see three solutions to the problem you describe:
>
> 1. Replace the restriction in ldup-model-06 8.2 such that all updates are
> sent to fractional replicas. When acting as a supplier, a fractional
replica
> replicates all replication updates, even those that are not within the set
> of attributes held by the fractional replica. Also, the fractional replica
> is responsible for applying only those update primitives that are within
> the fractional replica specification.
>
> I think this would cause major problems for state-based implementations.

Agreed. The server has to store the updates "somewhere" so that they
can be forwarded to other servers.

> It seems reasonable for log-based implementations.

I would expect there to be administrator concerns regardless of the style

of implementation. One reason for setting up a fractional replica is to
protect
certain information held by the supplier from being seen by the consumer.

>
> 2. Add a resriction to the model & info model to effect that a fractional
> replica cannot act as a supplier in LDUP.
>
> In your scenario that implies S3 cannot be a supplier to S1. Thus S2 must
> be a supplier to S2 and U1 and U2 are both replicated from S2 to S1. I'm
> not sure how this would be done -- either the configuration is rejected
> (preferred), or a fractional replica simply ignores requests to act as a
> supplier. I prefer rejecting the configuration -- why let someone set up
> a replication path that will never be used?
>
> 3. Add a restriction that a fractional replica can act as a supplier only
> to another fractional replica, where the consumers fractional
specification
> is a subset of the suppliers fractional specification (i.e. the supplier
> replica holds all entries/attributes held by the consumer, and may hold
more).
>
> For your scenario, this would preclude S3 acting as a supplier to S2
> (S2 - a full replica - does not hold a subset of the attributes held be
S3).
> I'm not sure where/when this restriction would be enforced. It seems that
> either the configuration has to be rejected outright -- topic for
management
> draft -- or that a supplier would have to evaluate the fractional
> specifications (if any) for itself and the consumer and determine whether
> it should, in fact use the agreement at all.

Solutions 2 and 3 both kill any possibility of updateable sparse and/or
fractional replicas. This seriously limits LDUP's usefulness in database
synchronization since external sources of data with which a directory may
be required to synchronize are likely to be both updateable and
sparse/fractional.

They also outlaw secondary shadowing topologies allowed by X.500
replication,
and which I already support. For instance, it would not be possible for a
server to shadow portions from two different naming contexts. In X.500,
administrative areas, e.g. for access control or schema, can and do span
naming contexts (replication contexts in LDUP). The administrative policy
inherited from superior naming contexts is called prefix information and
is included in X.500 replication updates. Prefix information is effectively
a read-only, sparse and fractional copy of information from a superior
naming context. The prefix information for two different naming contexts
will overlap, but neither will be a subset of the other. Solutions 2 and 3
will disallow the prefix information from two such naming contexts to be
replicated to the same shadow DSA.

>
> Assuming state-based replication remains in the standards, I think (2)
> would be a much cleaner solution, and most easily implemented.

. and very limiting. I don't want to have to choose between flexible
replication topologies and multiple masters. I want both.

You didn't enumerate the fourth solution: have an update vector per
replication area in a replication context and use the cascade rule to
maintain the update vectors.

P.S. I'm about to go off for a short break. I'll respond to any
follow-ups after I get back in two weeks.

Regards,
Steven