[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Syncing and partial (truncated) results




Hmm. I'm curious about the use case here. Is this aiming for sync
behavior?

Let me start here, because I have a very specific protocol approach in mind, but I think it might cover 90% or more of the likely uses.


Yes, I'm assuming what clients want to do is to sync. Certainly a client could be devised which doesn't sync, but rather starts fresh every time it wants to display something to the user. My guess is that such an approach would end up being heavyweight (on the wire) and that most clients are eventually going to want to sync. Note that syncing doesn't necessarily imply that clients have a persistent database, but at the very least they have some kind of local data structure (perhaps a GUI widget) that needs to be updated, and not all in one request. If we posit 100,000 entries, I think this is going to be the right approach, though I could be wrong.

Assuming for now that "sync" is a common paradigm for an Atom client, a simple syncing algorithm (assuming the PaceServerCollectionSubsets proposal) is as follows. The current sync begins at ThisSyncStartDate.

1. Request range: LastSyncDate thru ThisSyncStartDate
2. Let "MarkDate" be the date of oldest item in result--is it older than LastSyncDate?
3. If so, we're done--we have a full list of all the items that were updated between LastSyncDate
4. If not, request range LastSyncDate thru MarkDate.
5. Repeat from 2.


It's critical that the client knows that the returned result is a consecutive sequence of items according to atom:updated, and that they are the MOST recent items (in the requested range) rather than the LEAST recent items. A more thorough example is below.

First, to answer this question:

Existing APIs indicate that the most important thing to
provide is the last N things edited. Don't need a date for that. Where
does the rest come in?

The most important thing is the last N items, but the *next* most important thing is a few items before that. Typically (I'm assuming) the client doesn't need to specify *how many* because it wants to get as many as it can, up to its last sync operation. Using date ranges instead of positional indices also ensures that the meaning of the request can't change (see below for how that works out).


Yes, but if Bob edits an old item, what happens? Remember, the only
required field is atom:updated. I don't think there's an approach we can take that will ensure the client always has an accurate representation of the server's state.

If Bob edits an old item then it will jump to the head of the list, and thus it won't appear in the results. We can't ensure that the client has an exact up-to-date copy, but I think we can ensure that the client's copy is up to date *as of the time it begins syncing*.


To take an example, let's say Alice fires up her Atom client today and it begins syncing at 12:37 pm. It asks for the whole collection, but the server only returns the most-recently-modified 37 entries. The oldest amongst those is updated September 29, 2004 at 4:20 pm. Alice then asks for the date range 1900-01-01T00:00:00Z thru 2004-09-29T04:20:00Z. But before her request gets through, insidious Bob modifies one of the entries ("Elephants") that would have fallen in that range. Say it gets a timestamp of 12:38 pm. This entry will disappear from the set that Alice would otherwise be fetching, and it will jump to the head. But Alice will be none the wiser, and none the worse. When her client is done syncing, her local entries will match the server's state as of 12:37. The client might even report "Last refreshed at 12:37." Later on, she might come to know that Bob edited a post at 12:38. But Alice won't be upset that she didn't see that modification in her 12:37 sync (even if the sync didn't finish until 12:39, or later).

What does Alice see for the entry "Elephants"? She sees the version that she got on her last sync (that is, the version that was current before 12:37). Of course, if her last sync of "Elephants" was out of date, it will still be out of date.

The crux of this is that, since we're working backward in time, no surprises can come once we begin the sync. Also, since we're using date ranges, a series of ranges is guaranteed to pick up everything in that range; using ordinal indices, the meaning of a range request would change whenever anything changes in the list, leading to duplicated or missing items.

Does this sound like it would satisfy most of what people are looking for in syncing?

What this is lacking is the ability for the server to choose its own subset, but I'm not sure what advantage that would have for the server, and it could be a disadvantage to the client (assuming the client wants to sync). Robert Sayre pointed me to a note about this issue in the DASL work [1]. That spec allows the server to return an arbitrary subset, and for the server to give some metadata about what it's returning and how to get another chunk. It assumes that the subset is taken from some ordering, and relies on positional indices to place the subset within the larger sequence. I think that's more problematic than date ranges, as I tried to illustrate, but I'll be interested in what others think.

Thanks,
Ezra
[1] <http://greenbytes.de/tech/webdav/draft-reschke-webdav-search- latest.html#rfc.issue.result-truncation>