[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

-1 on MIME-in-XML (was Re: PaceSimpleContentType)



Greg Stein <gstein@xxxxxxxxxx> writes:

> HTTP has a notion of a content type, and a content encoding. These
> are specified by the (wait for it...) Content-Type and
> Content-Encoding headers. Encodings are simply transformations that
> have been applied to the base content (one or more, and ordered).
> 
> It seems that your "mode" is akin to the C-E concept from HTTP. It
> seems advisable to run with the thought that way: there is an
> underlying content-type (a standard MIME type), and then you have a
> particular encoding for that content to enable its insertion into
> the Atom document.
> 
> This would allow feed sources to specify their original content as
> plain text, HTML, XML, or even a Word .doc file. And then there is
> another bit which says how you jammed that thing into the feed.

Greg has provided the context for my discussing why I think
reproducing the efforts of HTTP or MIME in XML is a bad, very bad,
idea for Atom to take on.

First, let me preface this discussion by saying that I was one of the
principle champions of the <content> element model and did most of the
"speccish" text on the 'content' wiki page.  That page is not
accessable at the moment, so I've created a temporary link to it at:

  http://intertwingly.net/wiki/pie/ContentTemp

It links to and people would do well to also read much of the
discussion in related pages: ContentDiscussion, ContentAndPermalink,
MultipleContentDiscussion, MimeContent, EscapedHtmlDiscussion,
ContentProblems, ComponentBlog, AdaptiveBlogosphere.  Pay particular
note to the discussions of resources and fragments.

If I can humbly say this, I think I did a very convincing job of
championing this model.  Since I'm one of the progenitors of this
"baby" and am well familiar with its "bathwater" (MIME-in-XML), I
believe I am in a position to say that I was wrong *and* that there
are now credible alternatives that keep the "baby" (media type
representations) while discarding the "bathwater" (MIME-in-XML).  I
would particularly like to thank Tim Bray for pushing on "multipart
content" which set this house of cards to falling, as my previous
efforts to correct it have not overcome the original momentum behind
the content model.

Note that I/We are not the first to attempt MIME-in-XML and will
likely not be the last to realize what a tremendous effort it will be.

!  My main point here will be that Atom is not the right place to take
!  on this task -- it is seemingly out of scope, it is Atom's most
!  complex feature, and we have seemingly simpler alternatives on the
!  table (naked HTTP and real MIME).

Now to Greg's setting of the context, which has also been touched on
in previous messages:

Atom's original <content> model had a very simplistic view of
Content-Type and Content-Transfer-Encoding (Transfer-Encoding in
HTTP).  So simplistic, in fact, that we discarded most of it and kept
only the media-type identifier (the familiar type/subtype, and
discarding parameters) and limited the encodings to a single transform
of base64 (also, no parameters).  Reread Greg's comment,

> Encodings are simply transformations that have been applied to the
> base content (one or more, and ordered).

Atom doesn't have that.  HTTP also has a distinction between
Content-Encoding (how the sender received or kept the representation)
and Transfer-Encoding (what the sender did to send it to you), which
Atom doesn't have.  Atom also misses both the other headers and the
header extensibility often associated with content transfers.

Atom's <content> model is also a bit schizophrenic.  When HTTP and
MIME discuss resource representations, they do so in terms of octets
(bytes) and registered media type definitions, the latter of which are
generally geared towards "complete" representations.  Unless you use
Atom's base64 mode, Atom is using XML Characters (not octets) and
conventionally not passing representations as defined by their
respective media type registrations.  The original wiki model of
<content> at least allowed for a @rel parameter to indicate that a
complete representation was not being used -- *but* with or without
@rel Atom *must* define specific "profiles" of any media type we wish
to "adopt" and transfer as either a) XML Characters, or b) fragments.

Very few people really like using base64 so there's always a lot of
push to use "plain text", XML characters, or XML fragments associated
with a media type.  I've no problems with XML characters, plain text,
and I particularly like XML fragments of XHTML and non-XHTML document
types, but the association with a registered media type just seems
wrong -- especially when we have to have our own clear profile to go
with them anyway.

Mark Nottingham wrote earlier:
> Separate from this concern (multiple entries) is the way we identify
> different content types and encodings. Whether we choose the W3C or
> IETF, we'll have to have a very good reason to define yet another
> type system for atom content, when there's already a media type
> system. The only hiccup I see is HTML encoding, but I think we can
> address that without throwing out the media type system.

I'm not at all suggesting we have our own type system.  Once we have
reduced Atom's inlinable content types to non-schizoprhrenic portions,
we're basically left with XML characters and XML fragments.  While XML
fragments are still rather fuzzy at the W3C, they're being used by
convention in several places.  We will be specifically profiling an
XHTML fragment, for example.  The types of other fragments are also
extensible by their namespace, which is common if not yet a published
recommendation.  Note also that media types and XML formats have not
always gotten along well, most are transferred under the generic
application/xml media type.

The place where registered media types are appropriate are on links,
where just the type/subtype is used and is purely advisory.  The
complete Content-Type, Transfer-Encoding, and other appropriate
headers and negotiation are covered by other specifications.

We have two proposed solutions for linking to primary content, when
the primary content can't be inlined as XML characters or an XML
fragment:

  <content src="URL" type="type/subtype" ... />

and

  <link rel="content" type="type/subtype" ... />

(Note current discussion on <link> as appropriate.)

!  My preferred direction for Atom 1.0 is to reduce the Content
!  construct to its simplest form, and adopt an alternate method for
!  referencing raw media type representations for the <content>
!  element in particular.

!  Secondarily, if we decide that multiple <content>s are necessary
!  within a single entry (as opposed to alternate entries, such as for
!  a separate language feed), we may also need to apply the same rules
!  for Content construct elements in general.

If raw representations do go fully external, then
PaceSimpleContentType is dependent on the current discussion of
naked HTTP or real MIME.

If consensus is to keep base64 and simplistic media types, I would
definitely like us to clear up this confusion about resource
representations and de-facto profiling when media types are used with
non-base64 content (unless we somehow decide only to apply media types
to raw base64 content).

Sorry for the long post, but I hope that it covers all of the
background and detail that wasn't fit to include in the Rationale for
PaceSimpleContentType.  Please let me know if and where this helps (or
doesn't help) us move forward with shaping PaceSimpleContentType and
alternate content transfer techniques.

  -- Ken