From: Russ Allbery (rra@stanford.edu)
Date: Mon Mar 06 2000 - 22:24:02 CST
Charles Lindsey <chl@clw.cs.man.ac.uk> writes:
> If dbz is basically a device for mapping Message-IDs into an offset
> within the history file,
It's not. The whole concept of an offset into a text file is going away.
A lot of it is gone already, even in the current development sources, and
the rest of it will go away for at least one history backend if I ever
manage to find the time to get to the history rewrite.
Offsets into a text file are too inefficient, too slow, and cause problems
for in-place updates (which we want to be able to do for all sorts of
reasons, not the least being mvgroup). We're running into the 2GB limit
on text files already; the right way to fix that is to first go to a
binary file (which will shrink the size of the file by a factor of three,
at least, without any real loss given that the history file is, in
practice, binary already) and then add support for multiple buckets of
data so that we only have to worry about the size of the index.
So in the long run, in-place updates should be possible with at least some
of the history backends, but duplicates likely won't.
> OK, I think you need to explain to me exactly what this API token is. Do
> I gather than the history file essentially maps (a hash of) the
> Message-ID to an API token (plus an expiry date or whatever).
Right.
> So presumably this API token is mapped to actual storage in some way,
> whether via article numbers or not. I think I need to understand that
> bit of the process.
It's a black box. The storage API token is passed to one of the backend
storage APIs, and the storage API gives you the corresponding article.
The first few bits tell you what backend to use; beyond that, how the
storage API maps a token to an article is none of INN's business. It can
be black magic.
For traditional spool, it's a representation of the newsgroup and the
article number within that newsgroup, of course. But for CNFS, it's a
cycbuff and an offset in that cycbuff; for timehash it's the hashed
article path, for timecaf it's the time bucket and the article number
within the bucket, etc.
Mapping article numbers to storage API tokens is done via overview.
Mapping message IDs to storage API tokens is done via history.
-- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>