[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

thoughts on replication requirements



Here's some thoughts I have on our replication requirements and overall 
approach given the present version of the reqs I-D, traffic on the ietf-ldup & 
ldup-repl lists, and some review of some papers on this topic.

I feel Tim was correct in saying..

> The underlying problem here, and what's behind many of the issues John
> is raising, seems to be that it's not clear what we are proposing to
> standardize. 

I'll second that and also assert that there are key, fundamental, overall 
characteristics inherent in replicated data storage systems that we have not 
specifically enumerated and considered.

I think we need to explicitly do so in order to maximize our opportunities for 
designing a system (and underlying protocol) that meets our goals.

Below's a summary of these characteristics along with some discussion and 
references. It's apparent from the references that there are folks, outside of 
the "directory" community proper, who've been considering and building systems 
quite similar to our envisioned one and it makes sense to look carefully at 
what they've done to see what we can leverage off of -- both in terms of 
crystallizing our perspective of the overall system, but also in terms of 
utilizing specific approaches/techniques in implementing it (as Mark Wahl is 
proposing).

I do have specific comments on draft-weiser-repla-01.txt and will include them 
in a separate message a bit later on.

thanks,

Jeff
-----------------------------------------------------------------------------

Fundamental characteristics for replicated data stores...

1. Synchronicity

  Are our anticipated client applications synchronous or 
  asynchronous in nature?

2. Data consistency between replicas

  Do these client applications require weakly- or strongly-consistent 
  views of the data, or in-between, or all-of-the-above?

3. Operational semantics

  What are client applications' requirements for various aspects 
  of the data store's operational semantics?


"Synchronous" means systems where client entities share some "thing" at the 
same time, whereas "asynchronous" means systems where client entities share 
that thing at different times (or more succinctly, needn't necessarily share 
it at the same time).

Degrees of consistency relate to whether someone looking at a piece of data in 
the system sees it the same as someone else looking at it.

Synchronicity and consistency are often interdependent: the more synchronous 
an application, the more strongly-consistent its views of the "thing" 
typically need to be. Or, in reverse, the more strongly-consistent a data 
store can be, the more strongly-synchronous its client applications can 
(typically) be.

Operational semantics for replicated data stores cover the gamut of:

  Conflict detection - how do I know I have a data conflict?

  Conflict resolution - what can I do about a data conflict?

  Data propagation policies - when are updates propagated to other replicas?

  Consistency level - how consistent is ~my~ view of the data store across 
                      replicas, and across my own utilization of it? What 
                      guarantees do I have, if any?

  Data stability - what's the possibility that the data I'm looking at on a 
                   given replica might change?

  Replica selection - May I talk to which ever replica seems appropriate at 
                      a given time? May I switch replicas over time? 



Discussion...

Our current requirements doc does not explicitly address the above 
characteristics -- except for some aspects of operational semantics. Also, in 
terms of "consistency", the term isn't explicitly used, and additionally, 
requirements 5.2 and 5.7 are in direct conflict, as-written.

For example, are we assuming that directory client applications (e.g. a simple 
user-driven browser, or some network management environment) are typically 
synchronous or asynchronous in nature? Do we even know? I'd say we don't know 
overall, and even if we did, it'd change tomorrow anyway. If we adopt that 
perspective -- i.e. that we don't really know -- thus a ~requirement~ we have 
is that our replicated data store needs to support a range of client 
synchronicity needs. (aside: I searched the X.500, X.501, X.518, and X.525 
docs for "synch", and didn't get any hits except for in X.518 in terms of 
"asynchronous events", which are defined to be things such as "admin limit 
exceeded", which is decidedly in a different class than the considerations 
here) See [1] for more about how various application with various requirements 
are built upon Bayou.

Similarly, are we expecting that LDAP-based directory deployments utilizing 
replication are always weakly-consistent or strongly-consistent? Or should the 
system provide for some range of consistency behavior?

Replication agreements fall into the overall area of operational semantics, 
but our discussion of them has been only in terms of propagation scheduling 
and replica selection -- leaving at least four other operational semantics 
subareas to address.

I think that the pointer to the Bayou work that Mark Wahl provided in his 
proposal was quite relevant and that we should all take some time to read 
about it. The specific papers I've looked at are listed below.

Much of Bayou's potential contributions to LDAP replication appear to be in 
the areas of operational semantics and anti-entropy (i.e. update propagation) 
algorthim and data structure design -- both of which are key enablers for 
their realizing their (and our) primary goal of efficient "anywhere/anytime" 
access to data. Note that [2] provides experimental data on update propagation 
performance.

I've done some poking around a couple of comp sci bibliographic resources on 
the web for more info on replication, and haven't come up with anything other 
than Bayou, and the related work that is (notably carefully) cited in their 
papers.

Does anyone have any references to any papers roughly equivalent to any of 
those below ([2] especially) that deal with X.500 as the subject? I could only 
find some references to some work done at University of Ottowa, but the one 
paper ostensibly dealing with replication isn't presently in their repository.



References...

[papers below available at: http://www.parc.xerox.com/csl/projects/bayou/ ]

[1] W K Edwards, E D Mynatt, Karin Petersen, Mike J. Spreitzer, Douglas B. 
Terry, Marvin M. Theimer. "Designing and Implementing Asynchronous 
Collaborative Applications with Bayou". Proceedings of the Tenth ACM Symposium 
on User Interface Software and Technology (UIST), Banff, Alberta, Canada, 
October, 1997.

[2] Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer, 
Alan J. Demers. "Flexible Update Propagation for Weakly Consistent 
Replication". Proceedings of the 16th ACM Symposium on Operating Systems 
Principles (SOSP-16), Saint Malo, France, October 5-8, 1997, pages 288-301.

[3] Douglas B. Terry, Alan J. Demers, Karin Petersen, Mike J. Spreitzer, 
Marvin M. Theimer, Brent B. Welch. "Session Guarantees for Weakly Consistent 
Replicated Data". Proceedings International Conference on Parallel and 
Distributed Information Systems (PDIS), Austin, Texas, September 1994, pages 
140-149 (IEEE Computer Society Press).