[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Accuracy and Precision in dates
Contemplate, for a moment, realtime publishing. By that, I mean a world
in which weblog entries are created, validated, made available, issued,
modified, accepted, copyrighted, and submitted within seconds.
On top of that, consider the constraints of a user polling once an hour
or so. In such a system, knowing precisely in which point in in a
thirty second process the date was assigned amounts to unnecessary
precision.
In physics class, many years ago, I learned the difference between
accuracy and precision. Consistently hitting the target, it not the
bullseye, was accuracy.[1] Consistently hitting the same place, even if
that place was yards away from the target, is precision. The
implication being that if you have a choice between accuracy and
precision, accuracy is generally preferred.
Now, let's look at pubDate in RSS 2.0. It very clearly specifies that
this is to be the date that the entry was published. And that it is
optional.
Let's see how this applies to Blogger. When you create an entry, the
time you are presented with an entry field, you are also presented with
a time & date, which you can change. You can save it as draft and
publish it later. The result is a date and time which is *not* the
publish time. If you want to be pedantic, the right thing to do is to
not provide a pubDate, but to define a new element which contains the
proper semantics. That would be the precise thing to do. It would not
increase interoperability, though.
This is a case where the RSS 2.0 specification is unnecessarily precise.
If interoperability is the goal, precisely specifying the radius and
curvature of a round hole is not the best way to handle square pegs.
Nor is the solution to define more round holes. Some of the Atom date
proposals have had as many as five. The end result of such precision in
the face of lack of consensus is that such portions of the spec will be
routinely ignored.
So, how does one increase accuracy? One way is to enumerate the
interoperability issues. Dare's RSSBandit does not treat pubDate the
same way as NetNewsWire does. Tim's usage of this element in RSS is
consistent with the spec, and consistent with the way the tool of his
choice implements this spec.[2]
How does one resolve this interoperability issue? Simply to state that
this value MAY change. Dare agrees.[3] Note that this is not a weak
restriction[4] on producers, but effectively a warning to consumers.
Other such warnings can increase the probability that the date indicated
more often closely approximates when the entry became available than any
other date. An extreme example to illustrate the point: The Diary of
Samuel Pepys[5], republished in blog form 340 years later. How can we
discourage the "first date of formal issuance" of these entries being
placed into the atom:d element defined in PaceDateSamRuby? It would
seem to me that a note that "Consumers MAY chose to sort based on this
value." would discourage such usage.
Similarly, how do we stop future dates? "Consumers MAY chose not to
display entries containing atom:d elements until the date specified."
seems like it would be a pretty effective deterrent.
What have we lost? Well, if it turns out to be important that we
capture precisely the formal, informative, dates associated with
professional publications, it may make sense to indicate how such
information be recorded in a consistent manner. I can think of nobody
who has given that particular topic greater thought than the Dublin Core
Metadata Initiative. They have conferences and workshops on the topic.
That being said, I wouldn't have a problem with drawing a distinction
between the definitions the DCMI have come with and the serialization
syntax that the RSS-DEV Working Group came up for these terms. Perhaps
it would make sense to standardize an RFC 3339 version of these dates.
We can also discuss whether such dates should be in the core or in an
extension.
Another key element of a number of proposals is a modification or update
date. We need to face the very distinct possibility that a number of
producers will not distinguish between the two. In fact, some producers
may not be able to consistently identify that a modification occurred at
all, let alone when it occurred. Again, the solution (if this is deemed
vital to be included in Atom 1.0) may be to identify the consequences of
various usage patterns for this element.
- Sam Ruby
[1] http://elchem.kaist.ac.kr/vt/chem-ed/data/graphics/acc-prec.gif
[2] http://www.imc.org/atom-syntax/mail-archive/msg07850.html
[3] http://www.imc.org/atom-syntax/mail-archive/msg08419.html
[4] http://www.imc.org/atom-syntax/mail-archive/msg08396.html
[5] http://www.pepysdiary.com/