[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: AD Evaluation of draft-ietf-atompub-protocol-11
Some comments... largely written up in terms of what my working
assumptions have been. We'll see how things match up with what others on
the list are thinking.
Lisa Dusseault wrote:
> *High-level comments, summarizing comments*
> - The mechanism for creating a media resource and a media link entry in
> response to a single POST conflicts with at least one statement
> elsewhere in the draft, and has no example. This is one of those cases
> where I personally had some assumptions (that not every media resource
> had its own media link entry if the media resource had been created
> manually) that weren't ever quite cleared up by the spec. If the client
> CAN create a media resource without also creating a media link entry,
> that should be a separate example.
My assumption: a client cannot create a media resource without also
creating a media link entry. When I POST a media resource to a
collection, a media link entry is *always* created.
> - Overall, the responsibility model needs to be slightly better
> defined. E.g. we know the server is responsible for choosing a URL for
> new entries; it's not clear who's responsible for cleaning up linked
> entries if a user ever needs to clean up historical entries. Atom
> sometimes seems to split the responsibility, and those are the most
> complicated cases. More examples below as it's probably more useful to
> discuss specifics.
My assumption: the server is ultimately responsible for everything. If
the server wishes to give me a mechanism for deleting old entries, then
it's the server's responsibility for ensuring that the results of that
operation are consistent and correct.
> - To an outsider or newcomer -- including me even though I've been
> following discussions closely for a while -- there's a part of the Atom
> model that's subtle but important to understand. Consumers of Atom
> feeds are supposed to look at the regular feed document, whereas
> publishers of Atom feeds are supposed to look at other, different
> resources to see how to edit or create posts. Publishers effectively
> look at a different feed than users do, one with extra metadata (the
> rel="edit" links). It's a different model than that of WebDAV or IMAP,
> because rather than have the client specify which metadata it's
> interested in, the server offers two choices with different addresses.
> I believe it would be useful to cover that part of the model upfront in
> addition to the other useful stuff already there.
My assumption: The separation between "subscription feeds" and
"collection feeds" is not always clear. There are at least two deployed
implementations I am aware of that use the same feeds for both and I'm
currently working on a third. In Google's new Blogger Beta, for
instance, the subscription feed is also the collection feed.
I believe that any assumption that the subscription and collections
feeds will always be different is incorrect and dangerous.
> *Creating resources*
> Explicit result of POST, section 4.
> Are there zero, one or more resources created with a POST? There's a
> line at the top of section 4 which says that "POST is used to create a
> new, dynamically-named, resource". However, that implies ONE, whereas
> with media entries, a POST could create TWO resources. I believe a
> successful POST request as described here MUST either result in one or
> two resources, never zero, and never 3 or more (in the absence of
My assumption: A POST could actually create any number of resources.
For instance, when I POST an entry to my weblog, at least two resources
will be created (the editable entry and the HTML "permalink" page) and
several others will be modified (the collection feed, the blog index
page, archive pages, etc). Similarly, when I POST a media resource, I
would expect that multiple resources could be created. How many
resources are created/modified is up to the server implementation. From
the client's point of view, there are only two resources that are of
immediate concern: the atom:entry and the collection feed. At the very
least, A successful POST should result in the creation of the
atom:entry. No other assumptions can be made beyond that.
> What is the expected behavior of seeing a POST to an entry URL (rather
> than a collection URL)? I see that this is currently undefined; it may
> be worth stating that to warn clients. (I'm pretty indifferent on this
> one, as in this case I can't see any obvious harm in different server
> behaviors existing, if un-warned clients try it intentionally without
> knowing the results. The only possible harm is if clients got confused,
> did a POST to an entry URL when a collection URL was intended, and the
> server does a success response which creates new resources or modifies
> existing resources in a way the client did not expect. An error
> response would certainly be harmless for this undefined case but a
> success response could be real interesting.)
My assumption: This has always been left undefined. There are at least
two different implementations that do different things with the POST
operation on an entry. GData, for instance, supports using POST as a
way of working around PUT/DELETE blocking issues. IBM's OpenActivities
uses POST, in part, to support a variety of extended operations such as
undelete. Eventually, conventions for the use of POST on an Edit URI
may emerge. For now, it's best to leave it undefined.
> Creating entries with multiple media resources
> It's never explained how a client would go about creating a feed entry
> with a number of media resources. I imagine that it could be iterative;
> a client could create any of the resources at any time, and at any time
> after creating the feed entry, use PUT to update the feed entry to link
> to new media resources. I assume -- though I didn't see it stated in
> the document -- that it's the client's responsibility in almost all
> cases to put links in the feed entry to point to the media resources,
> otherwise the media resources are unlinked (effectively hidden to
My assumption: The relationship between a media link entry and a media
resource is always 1-to-1. If I want an entry to point to multiple
media resources, then I would create one media link entry for each media
resource, then create a separate entry that points to each of the
individually created media resources.
entry1 --> pic1
entry2 --> pic2
entry3 --> pic3
entry4 --> pic1
Entries 1, 2 and 3 will always point to their respective media resources
> The exception to this general process is if the client first uses POST
> to create both the media resource and the "Media Link Entry" in one go.
> In this case, can the "Media Link Entry" (MLE) be transformed to a
> regular "Member Entry Resource" (MER)? I thought it would be possible,
> but discussed just a bit with Tim today and he says no, so there you
> have two different readings of the spec. I guess a related question is
> what would happen if a client does a PUT of media content to an entry
> resource, or entry content to a media resource.
My assumption: I had originally wanted the ability to change "MLE" into
a "MER" but the general consensus of the WG seemed to be leading in the
other direction. Once a media link entry is created, it cannot be
changed into some other kind of thing. I personally think that's bogus,
Regarding what would happen if a client does attempt to change the type
of resource, that's undefined. In my current implementation, the new
content would be silently discarded, which is bogus. I'll likely change
the behavior so that the request is rejected.
> It's not clear to me whether a linked media entry is always listed in
> the metadata or not.
My assumption: Yes
> - When one or more "edit-media" link relations appear, who has been
> responsible for putting them there?
My assumption: the Server
> - When a media resource is deleted, who is responsible for removing the
> media resource link from the MLE?
My assumption: the Server. When the media resource is deleted, the
associated MLE SHOULD also be deleted.
> - Section 4 says that the MLE contains the metadata for a Media
> Resource, but that seems to only assume a single Media Resource. In the
> case of multiple Media Resources which the user intends to link into a
> single post, it's unclear to me whether there's one MLE for every Media
> Resource, or one MER for all the Media Resources created, or some other
> situation. Again, in quick discussion with Tim, he says there is one
> Media Link Entry per Media Resource. I can see how that would work but
> that was not at all my understanding before the discussion!
> This document would benefit greatly from further examples:
> 1. An example of creating a MLE and MR in a single POST; the request,
> response and the result (resource URLs) described.
Section 9.6.2 already contains such an example.
> 2. An example of modifying a MER to contain a new image or other media
> resource link: the request(s), possibly the responses (if it's
> interesting, it may not be), and definitely the result.
Agree. This would be helpful.
> 3. An example of modifying a MER to change metadata (e.g. category or
> adding a new link relation element or both); possibly a failed request
> example would be even more interesting than a successful one.
> Can a client modify an entry to contain a link relation element in the
> following cases:
> - To result in an "edit" or "edit-media" link relation, where the
> resource represented does not meet the requirements in section 11.1 or 11.2?
> - To result in an "edit" link relation that actually points to a media
> resource, or a "edit-media" link relation that actually points to a MER?
My assumption: the server is responsible for adding the edit and
edit-media links. Clients are not allowed to modify these values.
> - To point to a resource on a different server entirely?
> - To point to a valid media resource or MER that happen to be in a
> different collection than the one normally used for this feed?
> - Will some servers forbid adding a link relation element entirely? Is
> it important for the client to know that that will always be forbidden
> for that server -- can it detect the "always forbidden" case separately
> from the "this particular edit is forbidden" case?
My assumption: the decision is ultimately up to individual server
implementations, but in general clients should be allowed to add/edit
all link relations other than the edit and edit-media links. It is
unlikely, however, that your typical blog implementation of APP would
allow a client to modify the alternate link relation.
> Which of these are errors, and if so how is the error handled? Which of
> these MUST the server allow and handle? I understand there may be some
> need for flexibility here. Perhaps it's just standardized error
> messages required here. For example, if there are some servers which
> allow a given link relation to point to another server, and some servers
> which do not allow, how would the servers which do not allow respond
> with a sufficiently specific error, so that the client can avoid trying
> the same thing again?
My assumption: different servers are going to have different
requirements. This needs to remain as flexible as possible. A standard
way of reporting errors would be helpful, but trying to dictate too much
in this area will just lead to headaches.
> Multiple formats/langs for media resources
> Multiple formats are not sufficiently defined -- e.g. JPG and PNG
> versions of an image resource. Format negotiation is hard. I found
> guidance for how to select among different "edit-media" link relations
> depending on format and language, but I found no guidance on how to
> create multiple versions. If there's no guidance to clients or servers
> how to do it (would the client create multiple resources in different
> could the server automatically do it as variants?
> could the server automatically do it as multiple resources, and would
> be therefore listed?),
> it's probably worth considering whether there's
> possible interoperability harm here.
The behavior is ultimately up to the server. No normative behavior
should be specified (largely because we're not yet sure what all folks
will actually want to do).
> I can imagine clients creating
> alternate-format versions quite successfully because the operations
> would be explicit, but when I imagine how servers would go about it, I
> can easily see ways it could go wrong (e.g. creating new URLs for
> resources that are invisible to the media collection, having multiple
> URLs in locations where clients expect only one).
Again, it's the individual server implementations responsibility to keep
all this straight. Yes, there is a good possibility that some servers
will screw it up, but that's ok.
> I think there may be a very basic confusion here -- in my head or in the
> document or both -- about what the "edit-media" link relation does and
> is for. When I read the text it seems to offer the possibility for
> multiple formats for a single media resource, as suggested by the text:
> "If a client encounters multiple "edit-media" link relations in an entry
> then it SHOULD choose a link based on the client preferences for type
> and hreflang". However, when I try to think about how a client would
> create a post with a totally independent set of JPG images (e.g. one of
> the Eiffel Tower, one of the Louvre and one of the Arche de Triomphe),
> the "edit-media" link relation also seems to have relevance. Which is it
> or both? (and as always, who is responsible for filling it in or
> removing it when new media resources are created or destroyed?)
My assumption: the edit-media link relation is primarily intended to
allow media resources to be edited independently of their associated
Media link entry. The fact that multiple edit-media links may be
present is secondary and orthogonal to this main purpose.
> Thomas Broyer said in email July 24 that "Having the Content-Location
> value equal to the Location one tells the client that the response body
> is a representation of the newly created resource". This is a subtle
> reading of HTTP and, if it's true, I want to make sure that
> implementors understand this without having to read the mailing list.
> The spec reads " the response from the server SHOULD contain a
> Content-Location header that contains the same character-by-character
> value as the Location header." If the response from the server does not
> contain both headers identical, what should the client conclude? I
> think this is one of those SHOULD recommendations where the consequence
> of it not being a MUST need to be considered. Possibly the spec needs
> to say under what conditions the server would do otherwise; possibly the
> spec should say what the client knows, or does not know, or must do, if
> the server does otherwise.
My assumption: IMHO, this recommendation no longer unnecessary given
that our recent decision to require that the Location header always
point to the Edit URI of the resource created. I'd prefer that it be
> *Deleting Resources*
> In the case of an entry that points to multiple media resources, can the
> server delete all those media resources and their MLEs? (I think not).
> If a client issues a DELETE to a media resource, is its MLE deleted?
> (The spec covers the opposite case already when a client issues a DELETE
> to an MLE.)
My assumption: If a client DELETE's a MLE, it's associated media
resource SHOULD be deleted. If the client DELETE's the media resource,
the MLE SHOULD be deleted. If the client DELETE's an entry that links
to a media resource in any other way (e.g. there is some other kind of
link relation or an img tag in html content, etc) then the "correct"
behavior is up to the server.
> Can collections be DELETEd? It's fine for servers to allow or no, but
> if servers don't support, what error to use.
My assumption: It's up to the server. 405 Method Not Allowed.
> *Editing resources*
> Overall, the process for editing a resource is not entirely clear. I
> find the description of creating a resource (POST), and what the server
> can accept, ignore or reject, more clear than the description of editing
> a resource (PUT) . For example, there's normative text in section 9.2.1
> (an example) relative to creating resources and handling metadata, but
> that text isn't duplicated for editing resources or obviously apply to
> editing resources. Thus:
> - Can the client change the category? (probably yes; MUST the server
Generally Yes. It's up to the server.
> - Can the client change the atom:id? (probably never)
> - Can the client change the "updated" value to be some time in the
> future? Some time long ago? Or are there only two non-error changes --
> "now" or "the previous value"? MUST the server accept the value if it's
> the same as the previous value? Or can there be servers that always
> ignore "updated" values from clients? (and if so, is it important for
> the client to know that the server does this)
Generally No, but it's up to the server. For the most part, servers
should and will manage the value of atom:updated themselves.
> - Can the client change the set of link relations? (probably yes; but
> does that include "edit" and "edit-media" link relations only or also
> first/previous/next/last link relations?)
> In general the possible edits need to be covered to consider whether the
> server MUST allow these kinds of edits, or MAY, and if refused, what
> error for what reason.
Generally Yes, with the exception of edit and edit-media and possibly
alternate, all of which the server will usually manage.
> The spec says "The value of atom:updated is only changed when the change
> to a member resource is considered significant. " The use of passive
> voice obscures who does what here. When the client doesn't suggest a
> value for "atom:updated", does the server provide one, and if so, how
> does the server know what is "significant"? I thought it would always
> be the client suggesting values, but Tim says that the server controls
> atom:updated which could imply that the client doesn't even need to
> suggest values. See above about whether the server MUST accept certain
> values for "updated", or more likely, MUST NOT accept suggested values
> for "updated" when they're clearly wrong (e.g. this entry was last
> updated on October 16, 1906).
My assumption: Tim is correct. The server controls atom:updated. While
the client is required to provide a valid atom:updated element, the
server can (and in most cases will) ignore that value.
> Can a server ever ignore part of an edit and successfully process the
> rest? For example, the server receives a PUT request that tries to edit
> the text of a MER and includes a new category value, the server accepts
> the new text but silently ignores the category value. I suggest the
> answer would be MUST NOT silently ignore suggested changes, particularly
> since there's no way in a PUT response to say "here's what the server
> actually stored". It may be my opinion differs here from that of the
> WG. I find silently ignoring input to be scary.
My assumption: Yes, it's up to the server. In general, servers should
not make a habit of applying partial updates unless the client is aware
that data loss could happen. Current implementations handle this
> I predict that some AtomPub authoring clients will attempt to
> synchronize: to maintain an offline copy of the feed including all its
> MERs and media resources, and to keep that offline copy up-to-date.
> Some will probably even allow offline authoring of new posts, and offer
> to synchronize when the client next goes online -- because of the
> possibility of multiple authors, this may mean at times that the client
> would download new entries created by other authors, upload new entries
> created offline, and reconcile its offline copy of feed documents.
> Because authoring clients will attempt to do this based on Last-Modified
> and ETag -- after all, the functionality is all there in some form or
> another -- the spec needs a little more clarity on how the client can
> rely on this working. Otherwise, some servers may omit features that
> these authoring clients require, or implement them oddly. While I would
> never suggest repeating all the requirements from other specs (in this
> case HTTP), there are cases where clarity and interoperability are
> greatly improved by at least referencing explicitly requirements from
> HTTP. It's also possible to add new requirements based on features in
> HTTP, that apply to Atom servers alone.
I would be very happy to see some discussion of this in the spec,
especially if it normatively required the use of ETags for offline
> of HTTP (calendaring) than the general case. If HTTP synchronization in
> authoring cases were clearly defined and had not lead to years of
> arguments since the last HTTP update, I would probably feel differently
> about just silently relying on the mechanisms in HTTP.
> In any case, I have very specific brief suggestions to cover
> synchronization so that it's implemented more successfully than not.
> - Consider adding a brief section offering clients non-normative
> guidelines on synchronization. It doesn't have to limit server behavior
> so much as point out with green and red lights where the fairway is
> (mixing transportation and golfing metaphors in my head)
> - Make a few requirements of servers to avoid some of those HTTP
> ambiguities. For example:
> "The ETag or Last-Modified values for a member resource MUST change when
> any metadata in the resource changes, as well as text/content, and this
> includes "next" and "last" link relation values. The ETag or
> Last-Modified values of a member resource MUST NOT change solely because
> an associated other resource (e.g. the media resource being an
> associated resource to the media link entry resource) changed. "
+0.5. I would add that the ETag/Last-Modified values of a resource
SHOULD NOT change solely because of non-significant changes to the
infoset serialization of an entry (e.g. a different namespace prefix is
used, or whitespace is added or removed, etc). I would also prefer that
the ETag for a MLE change when it's associated media resource is updated.
> More open questions that might be related to synch or might have
> relevance even for clients that don't do full synch:
> - What is the relationship, if any, between the "atom:updated" value and
> the HTTP "Last-Modified" value. Can the "atom:updated" value ever be
> later (greater) than the "Last-Modified" value? I believe it can be the
> same or earlier, but the spec doesn't disallow the broken case.
My assumption: There is no relationship. atom:updated should be treated
as being entirely independent of Last-Modified. It might be reasonable,
however, to define a relationship between Last-Modified and app:edited.
> - Is it clear whether the client MUST GET the entry after modifying it
> in order to have an accurate offline cache? (this was mentioned in a
> post by Broyer Jul 13, but not in the document). I believe this is made
> clear already for the cases of getting the feed and also for
> POST/create, but not for PUT/modify.
My assumption: Yes.
> - Am I correct that the general assumption is that id's are there to see
> what entries are new, and URLs are there to see where to get them? That
> may mean that URLs could change, for a given ID -- Perhaps a feature to
> change the slug name of a image after attaching it. Is that
> theoretically possible?
My assumption: Yes. ID's also help to detect changes so that old entries
are not flagged as new when insignificant updates are made.
> There are also efficiency considerations.
> - The spec could require that servers MUST return either the ETag, or
> the Last-Modified value, in any successful POST or PUT response. I
> personally favour this so that clients can rely on it, though obviously
> other opinions are valid here.
I'd prefer this. My current implementation always returns the ETag on
the POST response.
> - I really liked the idea of putting ETag in the author's feed, as
> discussed on the list but not appearing in the document, again for
Are you referring to the app:etag element?
> How are categories compared? Case-sensitive, insensitive, according to
> which language? Would the categories "donné" and "donne" map to the same
> category as "Donne" and "DONNE"? I believe it's currently up to the
> server, which means unpredictable behavior from the point of view of
This is a separate issue that impacts RFC4287 in general more than it
does APP specifically. My assumption has always been that categories
are compared using case-sensitive, character-by-character analysis of
the scheme and the term, regardless of language. Your four examples
would each evaluate to different categories.
> See http://www.ietf.org/internet-drafts/draft-newman-i18n-comparator-14.txt,
> which has passed IESG Evaluation except for some IANA actions. This is
> a danger area for any draft going before the IESG which looks carefully
> at i18n these days.
> How do lang tags inside the document relate to Content-Language
> information in headers? Does the most granular override the other
> possible values? What about when client provides to server? Does the
> server ignore or handle?
My assumption: yes, the most granular overrides. It's up to the server
to ignore or handle with the preference between towards handling.
> The requirements for the link relations "next", "previous", "first" and
> "last" aren't as rigorous as for "edit" and "edit-media" link
> relations. Also they're defined quite separately -- I kind of thought
> that all the link relation types could be usefully defined in one
> section but if the editors prefer a different organization that's fine.
> But; is it OK if the resource pointed to by one of these link relations
> is on another server, in another feed, is a different kind of resource
> than you might normally expect, etc... ? I think the normal cases for
> these link relations are well-understood but not necessarily what a
> client should do if it encounters abnormal cases.
I've always been bugged by the underspecification of this section.
Given that there is a separate Feed Paging spec that is nearing
completion, I'd be very happy if paging were removed completely from the
> Discovering feed reading URL
> A very minor feature request for the introspection document: it SHOULD
> contain the public or published read-only feed URL of the blog (Tim
> suggests using link rel="alternate" type="application/atom+xml",
> although I'm not sure that makes it sufficiently clear what it's for).
> This so that my blog editing tool can show me not only all the entries
> and media resources (all discoverable from the introspection doc
> already) but also where the blog is published, so that I can copy that
> link to my friends when telling them about my blog.
Using alternate links for feed discovery is a sufficiently well-known
common practice. It would likely be fine.
> When the client puts extension elements in a MER, MUST the server store
> those unrecognized extension elements? I think the answer to this is
> actually that servers often do not and should not be required to do so.
> That makes it hard for clients to extend AtomPub's syntax in ways that
> other clients will understand but servers don't care about. Consider
> the consequences: when some enterprising client developer decides to do
> something cool and useful and encounters servers that don't store their
> metadata in the obvious place, the client developer is going to quickly
> work around that by storing in some unobvious place. For example in
> HTML comments in the atom entry content, or microformats, etc. Is that
> all cool?
It's up to the server. Google's GData, for instance, will reject
entries that use unknown foreign markup. IBM's OpenActivities will
silently dropped unknown markup. The implementation I'm working on
currently will retain all unknown foreign markup. I'd prefer to leave
> What are workspaces? I would like to see a definition. I believe I
> understand that basically, a workspace corresponds to a single published
> feed; that a workspace contains the collections with the content
> authored for that feed. I know the WG discussed this so maybe I can
> suggest wording at some point or simply register my vote for saying what
> it *is*.
No, a workspace is a purely organizational construct that will vary
depending on context. For instance, services like GData could offer a
service document with one workspace per service (e.g. Blogger, Calendar,
Base, etc) or, an implementation could offer a service document with
different workspaces segmented along other lines (e.g. "Private
Collections" "Shared Collections", "Team Collections", etc). The
intention is purely to provide a means of grouping related collections
in a logical way. There is no correlation between a workspace and the
number of available feeds.
> Besides the definition, I also wonder about workspace titles. That
> seems redundant with the title of the entry collection and possibly also
> the title of the feed (inside the main feed document). Is there any
> understanding of some of these values being identical, or any
> understanding of what different purpose they serve if they're not identical?
Again, there is no correlation between a "Workspace" and a feed. A
workspace title may be something like "James' Private Collections",
which may consist of twenty six individual feeds each titled with a
letter of the english alphabet.
> OPTIONS response
> HTTP is unclear about where PUT and POST show up in Allow headers.
> WebDAV ran into this as an interoperability problem -- some clients
> assumed that if they didn't see PUT in the Allow header for a
> collection, they couldn't write to that collection (the client might be
> checking for permissions or policy, having already established that the
> server was a WebDAV server but not certain if PUT would be allowed to
> this particular place). Some servers had PUT in the Allow header value
> for a collection, some servers didn't, based on the literal reading that
> you couldn't actually PUT straight to a collection URL. Clients had to
> end up with the OPTIONS Allow: header response being useless in this
> case. With somebody else's hindsight, Atom doesn't have to leave this
> ambiguous for the special kinds of resources it defines...
In my impl, the server will always report all of the operations that can
be applied to a resource, regardless of whether the current user is
allowed to use those methods. A collection, for instance, will always
specify (at a minimum) the GET and POST methods. An Edit URI will
always specify (at a minimum) the GET, PUT and DELETE methods.
> Cookie support, sessions, authentication
> Is there an assumption that clients MUST support cookies? without such
> a requirement explicitly stated, some clients won't, for reasonable
> security concerns. Instead, is there an assumption that clients MUST
> repeat authentication headers with each request? Or will servers
> effectively end up constantly "reminding" clients (through 401 errors)
> to authenticate? This might seem obvious but it definitely differs from
> regular HTTP practice where clients authenticate once and then stop
> sending authentication information automatically and it just works
> problem in WebDAV interoperability tests where some server implementors
> insisted that certain WebDAV clients were completely broken in not
> supporting cookies.
My assumption: the client should generally assume that auth headers
should be sent with each request and that cookies are likely not
supported. However, this is another case where the behavior is
completely up to the server.
> Are there assumptions that sessions will be maintained through
> persistent connections? I believe there should be none. That is, if
> you're a client implementor thinking that the first request will contain
> authorization and subsequent requests on the same connection have no
> authorization, think again.
My asssumption: client's should not assume that connections will be
persistent or that sessions will be maintained (or that a session even
exists). Each request is independent of all others.
Excellent feedback. Thanks for taking the time.