[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Minutes from San Diego



Jaap,

On Fri, 19 Jan 2001, Jaap Akkerhuis wrote:

>     I know a lot of queries from both ARIN and RIPE are people
>     doing data mining, and I suspect that this is true for all
>     other Whois registries as well.
> 
> We don't like datamining on our whois server. It could get us in
> conflict with the dutch privacy regulations. It is actually
> something that RIPE should probably think about as well. I've heard
> that some privacy advocates in the EU don't like any whois service
> at all.

Some kinds of data mining are worse than others.  When we spot people
mining for personal data, we block their queries and ask them to define
the reason for their queries.  When we spot people mining for network
data, we usually ask them to download the full dataset or apply to
mirror our database.  

Of course, you have probably noticed the "when we spot..." from the
previous paragraph.  The existing software does not automatically block
people - it has to be done manually.  The new version of our Whois
server, currently in beta, tracks number of queries from a specific IP
for either personal or network data, and limits them for given time
periods.  Continued abuse automatically disables access.  It's not
perfect, because it's based on IP's, and dial-up users or owners of
blocks of data can bypass the blocks, but hopefully spotting the users
who mine data in this fashion will be easier.

I wonder about existing LDAP servers and their ability to do this kind
of dynamic blocking.  Hmm....

> Another datapoint. We do have more then 500K objects.

As do we, I think by around two orders of magnitude.  However, my hope
is that a reasonable SQL database can scale this pretty easily.  This
should only be one or two more disk accesses per query.  It may require
quite a bit more RAM, but RAM is cheap, right?  I wouldn't mind having 
an excuse to get a machine with more than a Gbyte or two of RAM.  ;)

Shane