Re: Draft 0.3: References

New Message Reply About this list Date view Thread view Subject view Author view

From: Greg Berigan (gberigan@cse.unl.edu)
Date: Mon Nov 24 1997 - 12:12:07 CST


Kent Landfield <kent@landfield.com> carbon copied:
>gberigan@cse.unl.edu (Greg Berigan) wrote only to USEFOR:
>>Maurizio Codogno <mau@beatles.cselt.it> wrote:

>From: Kent Landfield <kent@landfield.com>
>To: gberigan@cse.unl.edu (Greg Berigan)
>Cc: usenet-format@clari.net

I do subscribe to the list. You do not need to send me extra copies of
your responses.

>>> I thought that there was consensus about the possibility to trim
>>> the References: line, leaving at least the first and the last three
>>> message IDs. Shouldn't it be added here?

> So did I and yes it should.

>> IMO, such trimming should never be necessary. Instead we should demand
>> that all news agents accept arbitrarily long References lists; no more
>> fixed-length arrays. If they have trouble communicating them to servers
>> they can use continuation lines. If a programmer is so inept as not to
>> know how to allocate dynamic character strings then that programmer
>> shouldn't be writing news software.

> Greg we have carried this discussion to great lengths.

Yes we did, but the choice of 1+3 is still arbitrary and has no basis in
how valuable the information is in threading. As yet I've seen no
examination on how many IDs should be maintained based on current dynamics
of news (average expiry times, frequency of thread forking with and without
subject changes, the lengths of linear, non-forking responses, the
influences of archive sites like Deja News). I'd like to see some
empirical studies on the value of each ID in the history. I expect that
the value of the first is greatest, followed by the most recent, and then
dropping for the recent past, but after about a week of age their value
generally increases with some chaotic fluctuations influenced by forking).
If these get trimmed when they're currently virtually worthless they won't
be around when they again increase in value. I expect they're still of
less value than the most recent and the most ancient, but certainly worth
more than when they were cut.

If any trimming is done, it should only be to the extent required by the
user's agent. As many IDs as possible must be maintained, and IMO any
trimming should be done starting at the second, possibly fourth, possibly
int(sqrt(total_count-1))'th ID forward(*). I'd also like to hear about
whether it is appropriate for more tolerant software to use its
reconstructed thread table to generate the References header for followups
to trimmed articles and how trimmed IDs should be represented in the header
(like if you trim 5 real contiguous IDs out of the header replace them with
the ID "<5@...>") so not all information is lost. Additionally, if someone
trims another 3 IDs before and/or after those 5, those could be compressed
into the ID "<8@...>".

> Dynamic storage
> is just one the issues. Carring around potentially a great deal of extra
> bulk for minimal gain is the big one. There needs to be some realistic
> guidelines documented here.

Reasonable limits aren't. We should not mandate any trimming and what
trimming we do should be limited only by what is functionally necessary.

(*) If there are 5 to 9 IDs, trim starting at ID #2
    If there are 10 to 16 IDs, trim starting at ID #3
    If there are 17 to 25 IDs, trim starting at ID #4, etc.
    If there are 1 to 4 IDs, you must not trim at all.
    Stop when your software's limitations (should have none) are satisfied.


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.