[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Accuracy and Precision in dates




On Saturday, August 14, 2004, at 11:24 AM, Sam Ruby wrote:
In physics class, many years ago, I learned the difference between accuracy and precision. Consistently hitting the target, it not the bullseye, was accuracy.[1] Consistently hitting the same place, even if that place was yards away from the target, is precision. The implication being that if you have a choice between accuracy and precision, accuracy is generally preferred.
Here's my thinking on what Sam wrote: we need to decide 1) what the target(s) is/are that we want to hit, and 2) how to increase our accuracy in hitting that/those target(s).

Here are some possible targets for use of date constructs, some of which may not be worth the trouble:

1) "Correct" or at least acceptable sort order (there are a number of possibilities for what a user might consider the best sort order)
2) Meaningful display date
3) Signaling that processing needs to be done ("something has changed since you saw this entry last")


I think those are listed in order of importance, and that we can probably devise a better method for addressing #3 (for example, <id version="3">uri...</id>).

So how can we be the most accurate on #1, and can we also be accurate on #2? I don't think it's possible to do both with a single date construct.

I'd imagine most users are usually going to want to sort based on either the initial publishing date, or the most recent significant update. There are likely also to be cases where people want to sort based on the "display date", though those would be more rare. Can we accommodate more than one of these, and how accurately?

So, how does one increase accuracy? One way is to enumerate the interoperability issues. Dare's RSSBandit does not treat pubDate the same way as NetNewsWire does. Tim's usage of this element in RSS is consistent with the spec, and consistent with the way the tool of his choice implements this spec.[2]

How does one resolve this interoperability issue? Simply to state that this value MAY change. Dare agrees.[3] Note that this is not a weak restriction[4] on producers, but effectively a warning to consumers.
Okay, I see now how this is intended as a warning to consumers. I'll admit that I scanned the example in your proposal pretty quickly, and thus missed that point. I think making the point at least briefly before the example would be helpful. If a small number of implementers read specs, the number who read examples in specs carefully is going to be even smaller, so depending entirely on them to make important points is likely to lead to misunderstanding.

Other such warnings can increase the probability that the date indicated more often closely approximates when the entry became available than any other date. An extreme example to illustrate the point: The Diary of Samuel Pepys[5], republished in blog form 340 years later. How can we discourage the "first date of formal issuance" of these entries being placed into the atom:d element defined in PaceDateSamRuby? It would seem to me that a note that "Consumers MAY chose to sort based on this value." would discourage such usage.
Explicitly providing a place to put a "display date" would be another, and would make the point more directly. If we DON'T provide a place for a display date, I'd like to see spec text more explicitly pointing out the implications of putting the "first date of formal issuance" there in such cases.

Similarly, how do we stop future dates? "Consumers MAY chose not to display entries containing atom:d elements until the date specified." seems like it would be a pretty effective deterrent.
Again, a "display date" would be another possibility.

Another key element of a number of proposals is a modification or update date. We need to face the very distinct possibility that a number of producers will not distinguish between the two. In fact, some producers may not be able to consistently identify that a modification occurred at all, let alone when it occurred. Again, the solution (if this is deemed vital to be included in Atom 1.0) may be to identify the consequences of various usage patterns for this element.
There's definitely some distance between what is possible, at least with existing systems, and what is desirable. I'd say knowing the date of first issuance and the date of most recent "significant" update (where "significant" is subjectively determined by the publisher) and a "display date" (ie., a date associated with the content or meaning of the entry) would be desirable. That would enable all the methods of sorting I would expect to be most preferred, and the placing of an entry in temporal context by displaying a date associated with its content. However, getting all three dates is clearly not always possible.

I have a difficult time letting go of the desire for a "display date", because if we don't support it explicitly, I can't think of a way to express it in a feed other than merging it into the content element--except by creating a new extension. But if the consensus is that we can drop it, I'll conceed.

What would be the consequences of having a date with one meaning appear in a date construct spec'd for a date with a different meaning? For example, what if we have atom:updated (meaning the last "significatnt" update), but somebody puts the date of first issuance (of the ENTRY, not the entry's content) in it? Will that cause us problems in hitting our target (a good sort order) accurately? What if we have atom:issued (date of first issuance of the entry) and somebody puts the last significant update date in it? Would it be better to spec a date construct with an imprecise meaning ("it could be first issuance, last major change, last change of any kind, etc.") so that it is less likely to be "wrong"? I'm afraid that if we relax the meaning of the date construct too much, the value of processing it "correctly" will be diminished. I'd rather keep the definition somewhat tight to increase the probability that it's meaning will be consistent across feeds, and then accept that fact that some people aren't going to follow the spec perfectly. Without thinking too hard about the consequences, I don't think they'd be too terrible in most cases.

I'd prefer to spec the date construct to indicate the last significant update. I'd think that would best accommodate:

1) Feeds that never get significant updates (they never have to change it)
2) Feeds that DO get significant updates (they CAN change it)
3) People who want to sort by the first issuance date (they can cache the first date they see in an entry, and ignore changes to it)
4) People who want to sort by the last significant update date (obviously)


People who do make significant updates, but for whatever reason can't update the date wouldn't be perfectly served, but I'd say that's their own problem--if it causes too much pain, they can improve their publishing tool or move to a better one.

People who can't provide any kind of "objective" date are going to cause reduced accuracy no matter what solution we pick, unless we loosen the spec to make the target difficult to miss. Since that would weaken the format for everyone else, I don't think we should go too far just to enable them to claim to be accurate and feel good about themselves.

[1] http://elchem.kaist.ac.kr/vt/chem-ed/data/graphics/acc-prec.gif
[2] http://www.imc.org/atom-syntax/mail-archive/msg07850.html
[3] http://www.imc.org/atom-syntax/mail-archive/msg08419.html
[4] http://www.imc.org/atom-syntax/mail-archive/msg08396.html
[5] http://www.pepysdiary.com/