[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Accuracy and Precision in dates




Contemplate, for a moment, realtime publishing. By that, I mean a world in which weblog entries are created, validated, made available, issued, modified, accepted, copyrighted, and submitted within seconds.


On top of that, consider the constraints of a user polling once an hour or so. In such a system, knowing precisely in which point in in a thirty second process the date was assigned amounts to unnecessary precision.

In physics class, many years ago, I learned the difference between accuracy and precision. Consistently hitting the target, it not the bullseye, was accuracy.[1] Consistently hitting the same place, even if that place was yards away from the target, is precision. The implication being that if you have a choice between accuracy and precision, accuracy is generally preferred.

Now, let's look at pubDate in RSS 2.0. It very clearly specifies that this is to be the date that the entry was published. And that it is optional.

Let's see how this applies to Blogger. When you create an entry, the time you are presented with an entry field, you are also presented with a time & date, which you can change. You can save it as draft and publish it later. The result is a date and time which is *not* the publish time. If you want to be pedantic, the right thing to do is to not provide a pubDate, but to define a new element which contains the proper semantics. That would be the precise thing to do. It would not increase interoperability, though.

This is a case where the RSS 2.0 specification is unnecessarily precise. If interoperability is the goal, precisely specifying the radius and curvature of a round hole is not the best way to handle square pegs.

Nor is the solution to define more round holes. Some of the Atom date proposals have had as many as five. The end result of such precision in the face of lack of consensus is that such portions of the spec will be routinely ignored.

So, how does one increase accuracy? One way is to enumerate the interoperability issues. Dare's RSSBandit does not treat pubDate the same way as NetNewsWire does. Tim's usage of this element in RSS is consistent with the spec, and consistent with the way the tool of his choice implements this spec.[2]

How does one resolve this interoperability issue? Simply to state that this value MAY change. Dare agrees.[3] Note that this is not a weak restriction[4] on producers, but effectively a warning to consumers.

Other such warnings can increase the probability that the date indicated more often closely approximates when the entry became available than any other date. An extreme example to illustrate the point: The Diary of Samuel Pepys[5], republished in blog form 340 years later. How can we discourage the "first date of formal issuance" of these entries being placed into the atom:d element defined in PaceDateSamRuby? It would seem to me that a note that "Consumers MAY chose to sort based on this value." would discourage such usage.

Similarly, how do we stop future dates? "Consumers MAY chose not to display entries containing atom:d elements until the date specified." seems like it would be a pretty effective deterrent.

What have we lost? Well, if it turns out to be important that we capture precisely the formal, informative, dates associated with professional publications, it may make sense to indicate how such information be recorded in a consistent manner. I can think of nobody who has given that particular topic greater thought than the Dublin Core Metadata Initiative. They have conferences and workshops on the topic.

That being said, I wouldn't have a problem with drawing a distinction between the definitions the DCMI have come with and the serialization syntax that the RSS-DEV Working Group came up for these terms. Perhaps it would make sense to standardize an RFC 3339 version of these dates. We can also discuss whether such dates should be in the core or in an extension.

Another key element of a number of proposals is a modification or update date. We need to face the very distinct possibility that a number of producers will not distinguish between the two. In fact, some producers may not be able to consistently identify that a modification occurred at all, let alone when it occurred. Again, the solution (if this is deemed vital to be included in Atom 1.0) may be to identify the consequences of various usage patterns for this element.

- Sam Ruby

[1] http://elchem.kaist.ac.kr/vt/chem-ed/data/graphics/acc-prec.gif
[2] http://www.imc.org/atom-syntax/mail-archive/msg07850.html
[3] http://www.imc.org/atom-syntax/mail-archive/msg08419.html
[4] http://www.imc.org/atom-syntax/mail-archive/msg08396.html
[5] http://www.pepysdiary.com/