[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: #1047 Path field delimiters and syntax - status



On Thu August 18 2005 09:52, Charles Lindsey wrote:
> 
> In <871x4sy5kd.fsf@xxxxxxxxxxxxxxxxxxxxx> Russ Allbery <rra@xxxxxxxxxxxx> writes:
> 
> >Bruce Lilly <blilly@xxxxxxxxx> writes:
> 
> >> No, we have not; there are no "sever interoperability" issues which are
> >> any different than are introduced by MISMATCH, POSTED, "!!", etc. --
> >> those are changes, and there will be some interoperability issues.
> >> Introducing ':' as other than a delimiter (its long-standing use) will
> >> introduce severe interoperability issues.  There are no performance
> >> issues with introducing comments; that is a red herring.
> 
> >There are no performance issues with comments provided that servers are
> >allowed to treat them as Path entries and the ()s as delimiters.  There
> >are (mild) performance issues if servers are expected to ignore them.
> >There are (mostly mild but more annoying) performance issues if the full
> >RFC 2822 CFWS syntax is expected to be supported.
> 
> I don't think it is as simple as that. If someone puts "(demon)" in the
> middle of the Path, then it is quite possible that demon customers will
> never get to see the article.

Iff it's not treated as a comment.  Ditto for !POSTED! vs posted customers,
!MISMATCH! vs mismatch customers, and !dead:beef::cafe! vs dead, beef, and
cafe customers.

> So I think servers would have to ignore 
> them, which means matching pairs of "(...)" inside them.

Yes; trivial.
 
> Bearing in mind that every relaying agent is supposed to locate the Path
> header within each article,

Invariant.

> scan it for valid entries (i.e. things between 
> delimiters/WSP/folds,

Invariant.

> but not the <tail-entry>)

Invariant (and also rather pointless, as that entry serves no purpose).

> and check each entry found  
> against the name(s) of the peer it is considering sending it to. Repeat
> for every peer.

Note that there are any number of ways to do so which do not require
parsing the Path field more than once, or other than left-to-right, at
any given server.  It necessarily means comparing M non-comment, non-
diagnostic, non-keyword, non-bogus path entries to N peer names (although
clearly the tests can be short-circuited once a match for a specific
peer is found). For an article with M valid path entries and at a site
with N peers, none of which are listed in the path field, that is a
minimum complexity of O(N*M). 

> any extra work put into
> that loop is going to impact performance

1. skipping from '(' to the matching unquoted ')' is outside of the
   loop; it is one-time field parsing and none of the skipped content
   is checked against any peer names.  Nor does it affect the number
   of non-comment entries to be checked.  O(N*M) still applies.
2. skipping over comments is trivial (compared to e.g. parsing 2231
   parameters or comparing keywords to a (possibly long) list of peer
   names)

> All of which is why we removed <comment>s from the Path quite some time ago.

"we" didn't remove them.  It was a one-person, non-consensus, unilateral
action based on flawed reasoning.