From: Russ Allbery (rra@stanford.edu)
Date: Mon Sep 11 2000 - 16:00:23 CDT
Charles Lindsey <chl@clw.cs.man.ac.uk> writes:
> Russ Allbery <rra@stanford.edu> writes:
>> Sure. From my current dynamic spam filter, here's an excellent example:
>> NNTP-Posting-Host: !^n=[1k-Y6Rq8'HG]aa3EF<4_ (Encoded at Airnews!)
> Now that is real ugly. Can you give us a breakdown?
Nope, no clue. Have no idea what Airnews does. But it works as an opaque
cookie to filter on, which is all I ask.
> I think that posting-host should be something that is immediately
> recognisable to a human (and also to a machine) - i.e. an FQDN or IP. If
> more specific imformation is to be conveyed, then I think a
> posting-account is the right thing (that can be in a notation which may
> only be understood by the ISP - just so long as it always contains the
> same string for the same account). But putting it in a separate
> posting-account field means at least that a human can understand what it
> is there for. Then you filter on posting-host, or posting-account, or
> more likely both, according to what you are trying to achieve.
I'd really just like to have *one* field that contains a string I can
filter on, without having to have any idea what the contents of that
string is.
>> The purpose of most NNTP-Posting-Host filtering is to dynamically
>> adjust to and start rejecting spam. The way this is done is by using
>> rate limiting on particular NNTP-Posting-Host content, generally
>> combined with the number of lines in the article to not get false
>> positives from off-line readers and similar bursts of posting.
> OK, are we talling about filtering at a relaying agent or at a serving
> agent?
Relaying agent. I've been using this filter on newsfeed.stanford.edu for
quite some time, and a similar scheme is built into Cleanfeed.
> If the latter, then what happens to the stuff coming in faster than the
> acceptable rate. Is is dropped, or is it delayed in some manner?
It's rejected.
> No, some human must have written the entry in the spam filter, having
> observed that a given site was in the habit of injecting spam, and
> having observed what it was putting in its NNTP-Posting-Host field. Or
> are you saying that the filter observes all posting-hosts on the
> network, discovers which of them have high rates, and constructs the
> filter automatically?
Bingo.
An inferior version of this functionality is even built into Diablo (and
perhaps they've fixed it by now; I don't know).
I don't have time to babysit the filters; they pretty much have to be 99%
automatic. I occasionally add exceptions (localhost.webtv.net, for
example), but by and large it just runs by itself and works quite nicely.
-- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>