[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Accuracy and Precision in dates
On Saturday, August 14, 2004, at 11:24 AM, Sam Ruby wrote:
In physics class, many years ago, I learned the difference between
accuracy and precision. Consistently hitting the target, it not the
bullseye, was accuracy.[1] Consistently hitting the same place, even
if that place was yards away from the target, is precision. The
implication being that if you have a choice between accuracy and
precision, accuracy is generally preferred.
Here's my thinking on what Sam wrote: we need to decide 1) what the
target(s) is/are that we want to hit, and 2) how to increase our
accuracy in hitting that/those target(s).
Here are some possible targets for use of date constructs, some of
which may not be worth the trouble:
1) "Correct" or at least acceptable sort order (there are a number of
possibilities for what a user might consider the best sort order)
2) Meaningful display date
3) Signaling that processing needs to be done ("something has changed
since you saw this entry last")
I think those are listed in order of importance, and that we can
probably devise a better method for addressing #3 (for example, <id
version="3">uri...</id>).
So how can we be the most accurate on #1, and can we also be accurate
on #2? I don't think it's possible to do both with a single date
construct.
I'd imagine most users are usually going to want to sort based on
either the initial publishing date, or the most recent significant
update. There are likely also to be cases where people want to sort
based on the "display date", though those would be more rare. Can we
accommodate more than one of these, and how accurately?
So, how does one increase accuracy? One way is to enumerate the
interoperability issues. Dare's RSSBandit does not treat pubDate the
same way as NetNewsWire does. Tim's usage of this element in RSS is
consistent with the spec, and consistent with the way the tool of his
choice implements this spec.[2]
How does one resolve this interoperability issue? Simply to state
that this value MAY change. Dare agrees.[3] Note that this is not a
weak restriction[4] on producers, but effectively a warning to
consumers.
Okay, I see now how this is intended as a warning to consumers. I'll
admit that I scanned the example in your proposal pretty quickly, and
thus missed that point. I think making the point at least briefly
before the example would be helpful. If a small number of implementers
read specs, the number who read examples in specs carefully is going to
be even smaller, so depending entirely on them to make important points
is likely to lead to misunderstanding.
Other such warnings can increase the probability that the date
indicated more often closely approximates when the entry became
available than any other date. An extreme example to illustrate the
point: The Diary of Samuel Pepys[5], republished in blog form 340
years later. How can we discourage the "first date of formal
issuance" of these entries being placed into the atom:d element
defined in PaceDateSamRuby? It would seem to me that a note that
"Consumers MAY chose to sort based on this value." would discourage
such usage.
Explicitly providing a place to put a "display date" would be another,
and would make the point more directly. If we DON'T provide a place
for a display date, I'd like to see spec text more explicitly pointing
out the implications of putting the "first date of formal issuance"
there in such cases.
Similarly, how do we stop future dates? "Consumers MAY chose not to
display entries containing atom:d elements until the date specified."
seems like it would be a pretty effective deterrent.
Again, a "display date" would be another possibility.
Another key element of a number of proposals is a modification or
update date. We need to face the very distinct possibility that a
number of producers will not distinguish between the two. In fact,
some producers may not be able to consistently identify that a
modification occurred at all, let alone when it occurred. Again, the
solution (if this is deemed vital to be included in Atom 1.0) may be
to identify the consequences of various usage patterns for this
element.
There's definitely some distance between what is possible, at least
with existing systems, and what is desirable. I'd say knowing the date
of first issuance and the date of most recent "significant" update
(where "significant" is subjectively determined by the publisher) and a
"display date" (ie., a date associated with the content or meaning of
the entry) would be desirable. That would enable all the methods of
sorting I would expect to be most preferred, and the placing of an
entry in temporal context by displaying a date associated with its
content. However, getting all three dates is clearly not always
possible.
I have a difficult time letting go of the desire for a "display date",
because if we don't support it explicitly, I can't think of a way to
express it in a feed other than merging it into the content
element--except by creating a new extension. But if the consensus is
that we can drop it, I'll conceed.
What would be the consequences of having a date with one meaning appear
in a date construct spec'd for a date with a different meaning? For
example, what if we have atom:updated (meaning the last "significatnt"
update), but somebody puts the date of first issuance (of the ENTRY,
not the entry's content) in it? Will that cause us problems in hitting
our target (a good sort order) accurately? What if we have atom:issued
(date of first issuance of the entry) and somebody puts the last
significant update date in it? Would it be better to spec a date
construct with an imprecise meaning ("it could be first issuance, last
major change, last change of any kind, etc.") so that it is less likely
to be "wrong"? I'm afraid that if we relax the meaning of the date
construct too much, the value of processing it "correctly" will be
diminished. I'd rather keep the definition somewhat tight to increase
the probability that it's meaning will be consistent across feeds, and
then accept that fact that some people aren't going to follow the spec
perfectly. Without thinking too hard about the consequences, I don't
think they'd be too terrible in most cases.
I'd prefer to spec the date construct to indicate the last significant
update. I'd think that would best accommodate:
1) Feeds that never get significant updates (they never have to change
it)
2) Feeds that DO get significant updates (they CAN change it)
3) People who want to sort by the first issuance date (they can cache
the first date they see in an entry, and ignore changes to it)
4) People who want to sort by the last significant update date
(obviously)
People who do make significant updates, but for whatever reason can't
update the date wouldn't be perfectly served, but I'd say that's their
own problem--if it causes too much pain, they can improve their
publishing tool or move to a better one.
People who can't provide any kind of "objective" date are going to
cause reduced accuracy no matter what solution we pick, unless we
loosen the spec to make the target difficult to miss. Since that would
weaken the format for everyone else, I don't think we should go too far
just to enable them to claim to be accurate and feel good about
themselves.
[1] http://elchem.kaist.ac.kr/vt/chem-ed/data/graphics/acc-prec.gif
[2] http://www.imc.org/atom-syntax/mail-archive/msg07850.html
[3] http://www.imc.org/atom-syntax/mail-archive/msg08419.html
[4] http://www.imc.org/atom-syntax/mail-archive/msg08396.html
[5] http://www.pepysdiary.com/