[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[no subject]



On Fri, 2003-08-08 at 15:49, Jeremy Gray wrote: 
> As long as you write:
>  - created is whatever time the person wants to say the post came from
>  - modified is the "real" modified
>  - issued is the "real" issued time.
> 
> But then even that would blend soft, human-oriented timestamps with hard
> machine-oriented timestamps in ways that might conflict, depending on
> exactly what you mean by "real" as well as on how you choose to interpret
> the word "resource" in context of the DC documentation.
> 
> > In that case, what we've been agreeing upon has been wrong, including the
> validator; it is *issued* that
> > must have a timezone, and it is *issued* and modified that are redundant
> when a post has never been
> > modified.
> 
> As long as one ties "modified" to the time at which the modified content was
> published in its modified form, as opposed to the time at which the content
> itself was modified. The latter of the two appears to be the more accurate
> interpretation of the Dublin Core documentation, which ties modified more
> closely to created than to issued, by my interpretation.

Both of those together lead me to believe that using these Dublin Core
terms will lead to confusion.  I'd rather have a spec that had hard and
simple rules defining these values.

> Which one (or combination) is going to be most useful to software processing
> the entries that went up on the web? To the humans working with the results
> of that software's processing? It's a difficult choice. One that cannot be
> clearly made in the face of certain examples.

Let me try again, then, and state what I'm trying to get at below. 
Dublin Core aside, what we are defining is a machine-usable format.  If
it's not useful to machines, we might as well be passing around HTML
instead.

> [snip]... I'm willing to rescind that interpretation if it
> will help move us forward towards a single interpretation that supports the
> most common application scenarios, and let me make the following clear:
> Pepys' Diary is not one of them. At least not in terms of helping us
> establish a standard that doesn't slide down a slippery slope of
> complications surrounding occasional re-publising scenarios, begging a
> question regarding Pie/Echo/Atom/Feedcast/whatever's scope (regarding
> re-publishing/syndication issues) and usage of the terms currently under
> discussion.

I think we're getting somewhere, here.  To my mind, Pepys' Diary is
exactly the sort of problem we need to address, because it is the same
problem as my going to the mountains for a week:  there is more than one
timestamp related to the post and both are vital for the purposes of
displaying in an aggregator.

> Here's a question that might have the potential to drive at least part of
> the discussion forward: In your mind, using the Pepys' Diary example, would
> ANY of the dates, created, issued, or modified, be in the 1600s?

Yes, of course.  Just in the same way I'd want to include a subject on
my entry; it's metadata, but it's relevant to my entry.

I think the problem here is that most aggregators display feeds
individually so their relative dates don't matter so much.  I'm
accustomed to LiveJournal, where all of your feeds are collected into
effectively a single inbox.

Let me try to explain my thinking by way of an example.

Here are three posts (allow me to fuzz the times a bit to make it more
clear).  Note that I'm intentionally naming the times with something
other than the DC terms to reduce confusion.
- Normal blogger: wrote something in March 2003 and posted it
immediately.
- Pepys: written in 1600, posted June 2003.
- Mountain man: written in March 2003, posted in August 2003.

Now let's look at this from the perspective of an inbox-style
aggregator.

When showing those three feeds aggregated together, what order should
they be in?  Clearly we don't want to sort based on the time they were
written; if we did, Pepys entries would be always be at the bottom
hidden by posts written more recently.  We want to use the post time,
here, and the post times across entries need to be comparable (that is,
including a time zone).

However, we do want to display the 1600 associated with Pepys somewhere;
in fact, I think (and in practice on LiveJournal, this holds true) that
1600 is much more useful information for display purposes than the 2003
associated with his entry.  (You'll also note that Pepys lacks a
timezone for his write times, another pointer towards a human trait.)

So we have two separate times here, and both are useful to different
pieces:
 - the post time, which is a fully-specified timezone-including timestap
is useful to the aggregator for purposes of sorting;
 - the human-provided write time is useful to the human reader for
understanding the setting for the entry. 

If you object to me using Pepys and say it is an exception, consider the
mountain man.  Once he's returned from the mountains, he dumps his posts
from March.  Should his new posts be hidden beneath the old posts by the
normal blogger, who has been posting all the while the mountain man was
away?  I don't think so.
A similar argument could be made for two people blogging back and forth
on opposite sides of the planet: just because one person is eight hours
ahead of the other in terms of timezones doesn't mean their posts
shouldn't intermingle sequentially.

In the case of a channel-oriented aggregator, the post time is less
useful; I think they generally sort based on write time.  Similarly with
email, which is why we all occasionally get misdated spam that shows up
on the wrong end of our mailbox.


Ok, now let's throw in modification time:  what is it useful for?  Well,
the standard use case is flagging the entry as unread again.  In that
case, you need the modification time to be a real timestamp.  In the
common case, the post time and modified time are the same thing; keeping
them separate allows you to sort based on post time while still flagging
modified entries in the past.


Now let's return to the Atom terms.  We currently are tossing about
three times, which have names that are similar to Dublin Core names but
are apparently underspecified enough to cause confusion.  (For example,
the distinction between "time the entry was actually modified" and "time
the modified entry was posted" mentioned above, while potentially
interesting data, is not the sort of ambiguity we want in a spec.)

Now, to (finally :P) answer your question.  In the LJ Atom
implementation, which was based on reading the Wiki and MT
implementation and from conversations with the validator authors, I am
using these mappings:
 - issued   -> write time
 - created  -> post time
 - modified -> modify time

If these DC terms are really as fuzzy as it seems to me, we might be
better off using different names.

-- 
Evan Martin
martine@xxxxxxxxx
http://neugierig.org