[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dealing with large collections [Re: URI constraints]



On Wed, Oct 13, 2004 at 04:37:17PM -0400, Robert Sayre wrote:
> Greg Stein wrote:
> >To put a stake in the ground, I'd suggest that any query only deal with
> >one of the N date fields, ordering of that field, limiting to start/end
> >range of dates, and applying a maximum count limit.
> 
> I like that stake, it looks good there.
> 
> If we keep this super simple, maybe we could skip the DASL-esque 
> approach for basic implementations and rely on a GET with a "Range" 
> header and a custom ranges-specifier. Or maybe some kind of delta 
> request would do.

I would not recommend either of those approaches, though I'll certainly
give you the "creative" tag :-)  My thinking:

* it isn't a range because you're asking for specific ordering. if it
  wasn't for that, it would be darned close. the ordering request is
  effectively a "content negotation" much like asking for english versus
  french.

* it isn't a delta because you aren't saying "I have X, give me the
  difference described using <this> format."

If we go for the low-road form, then I'd just suggest a new Atom header.
Possibly something like this:

  Atom-Get-Partial: field=publish-date; start=20041001; end=20041013;
    order=asc; limit=20

Each part would be optional, and predefined tokens are defined for the
field and order parts. The dates are ISO-8601 (and we could proscribe
certain forms of dates, similar to http://www.w3.org/TR/NOTE-datetime)

> Perhaps these approaches are considered evil for some reason, but they 
> don't break caches:

It isn't a big deal. The server merely returns:

  Vary: Atom-Get-Partial

That tells the cache that the response was altered based on the value of
the Atom-Get-Partial header. Honestly, though, I'd consider the resource
pretty much non-cacheable. Or at least the server ought to provide an
expiry of "1 minute" or somesuch.

Note: if a server does not understand the new header, then you'd end up
fetching *all* of the content, and it would *not* be ordered properly. It
seems like we could end up with some interop issues.

I still like the idea of a teeny search grammar. Each of the parts of the
above example would just be an XML element in a query.

For example:

  SEARCH / HTTP/1.1
  Host: example.org
  Content-Type: application/xml
  Content-Length: xxx

  <?xml version="1.0" encoding="UTF-8"?>
  <D:searchrequest xmlns:D="DAV:" xmlns:A="atom-namespace">
    <A:atom-query>
      <A:query-field><A:publish-date/></A:query-field>
      <A:range-start>20041001</A:range-start>
      <A:range-end>20041001</A:range-end>
      <A:sort-order><A:ascending/></A:sort-order>
      <A:limit>20</A:limit>
    </A:atom-query>
  </D:searchrequest>

Something like that.

Cheers,
-g