From: Bruce Lilly (blilly@erols.com)
Date: Tue Mar 23 2004 - 21:09:58 CST
Charles Lindsey wrote:
> In <405E8B63.6030900@erols.com> Bruce Lilly <blilly@erols.com> writes:
>
>
>>For more insight into what In-Reply-To adds, consider
>>http://users.erols.com/blilly/mparse/usefor/ref+irt.png
>>and
>>http://users.erols.com/blilly/mparse/usefor/ref-only.png
>
>
> Your examples are interesting, but your interpretation of them is
> incorrect.
You're being a bit presumptuous, since I came up with the example.
> You cannot say that they were constructed from the References with or
> without the aid if In-Reply-To without first defining how the protocol for
> constructing References headers had been extended to include the case
> where messages are allowed to be permitted to be followups to multiple
> messages (a case not recognixed by either RFC 1036 or RFC 2822).
RFC 2822 does explicitly address the case, but stops short of giving
rules for constructing References. In the example, each message's
References field has the following properties, which are consistent
with the rules for a response to a single message:
1. Each message which is an ancestor is mentioned
2. The last message-id is one of the immediate predecessors
Three methods were investigated for constructing such References
fields: a forward merge, a reverse merge, and a topological sort
of dependencies. All work; the topological sort is efficient but
requires that dependency information is known, the merges are
quadratic complexity but can be derived from the References fields
of the predecessors w/o any supplemental dependency information.
> any attempt to
> show the tree structure is boud to fail, because it isn't a tree.
The References field is only a list of nodes; it does not describe
the connecting (directed) edges. The only edge that can be inferred
from property #2 above is the edge from the last-mentioned message-id
in each message's References field to the message-id of the message
containing that References field.
> However, I mentioned earlier that there was a bug in the Zawinski
> algorithm (essentially, once it had discovered that the parent of C was P,
> it never went back to change that view, even if it found later evidence
> that there was an intermediate article between P and C).
Yes, there are a number of issues with that algorithm. Another is the
fact that multiple message-ids in In-Reply-To are ignored.
> (assuming that
> articles a and q had different Subjects, as is likely)
Since the example was intended to demonstrate the information content
of In-Reply-To fields on dependencies (given availability of all
relevant messages), neither dates nor subjects were specified. With
availability of all messages (or some information in each message
documenting relationships constituting edges, since its References
field already lists all nodes which are ancestors), all information
needed to construct an accurate graph is available.
> If you fix the
> algorithm, then it would not have that problem and, moreover, it would
> linearize the whole thing so as not to appear as a tree when it wasn't.
The dependency graph is a general graph, not a linear list. The
fundamental problems are:
1. References gives no edge information
2. In-Reply-To gives partial edge information (only in-edges for
the message containing that field).
Problem 1 means that some supplementary information is required for
edges. Problem 2 means that multiple messages must be examined to
obtain all of the edge information. That isn't a big problem for
the issue of the draft, where the messages reside on a server which
constructs thread information and returns a representation of that
thread to a client. It is a big problem where the client has to
examine each message itself (e.g. via NNTP).
One solution would be an additional header field to convey the edge
information for the nodes mentioned in the References field. So in
the example given, one might have a message:
Message-ID: <z>
References: <a> <q> <c> <r> <d> <s> <h> <t> <i> <u> <k> <j> <w> <v> <l> <x>
In-Reply-To: <l> <x>
Edges: 1 3 7 11 15; 1 5 9 11; 5 12 15; 2 4 8 14 16; 2 6 10 14; 6 13 16
where each semicolon-separated sequence in Edges lists a path
through directed edges of the corresponding message-ids in the
References field. So a is an immediate predecessor of c, c is
an immediate predecessor of h, etc. That contains all of the
information necessary to construct the entire dependency graph
without the need to examine any other messages.