[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: MIME types and fragment identifiers in HTML and XML



One more reference to include: the XPointer draft claims to
define what fragment identifiers for XML content-type are supposed
to be, but isn't referenced in your analysis

http://www.w3.org/1999/07/WD-xptr-19990709 says:

  XPointer defines the meaning of the "selector" or "fragment identifier"
  portion of URIs that locate resources of MIME media types "text/xml" and
  "application/xml".


> 1. URI and URI reference
>
> A URI does not have a fragment identifier, but a URI reference (as defined by
> RFC 2396) may have a fragment identifier.  (Note: HTML and XML very often
> say "URI" when it should actually say "URI reference".).

To be fair, the terminology has changed over time; the term "URI" is
used ambiguously in various documents.

> 1) URI
>
> An entity is returned by some protocol  (here, the word "entity" is used as
> in RFC 2616).  The protocol should provide some mechanism for transmitting or
> inferring media type. In HTTP and email, this is done explicitly with the
> 'content-type' header.

Not all resources identified by a URI have a way of obtaining an
'entity'. HTTP does, and so "http:" URLs can talk about entities
being returned, but "mailto:"; has no corresponding entity. It isn't
necessary for there to be a 'protocol' that 'returns' entities, either;
for example, the "cid" URL scheme makes reference to content of
MIME messages without a specified protocol, but it would be possible
to use a fragment identifier with a "cid" URL.

Those URIs which don't have a way of obtaining entities also don't allow
fragment identifiers.

Your use of 'email' here is somewhat confusing, because there is often no
appropriate URI to use for an email message. However, other protocols
use MIME content-type for content identification, including IMAP and
IPP.

> 2) URI reference
>
> A URI is first constructed.

I'm not sure "constructed" is the right word here, but I'm not
sure what you meant.

> An entity is returned or accessed interactively
> by some protocol.  The protocol should indicate the media type of the
> entity.
> Then, the user agent for this media type may extract or locate some fragment
> of this entity by using the fragment identifier.

I don't think "user agent" is the right term here; it is the
"interpreter for this media type"; in some cases, the interpreter
is part of a user agent, and in others, it's part of some other
function.


> The protocol does not indicate the media type for that fragment.

You're saying that fragments don't have media types themselves, I think.

>  Thus, it
> does not have content types, unless the fragment contains some other way of
> specifying media types. (For example, RFC 2397, "data" URL scheme, provides
> a way of including MIME content-type along with encoded data.)

I think you're trying to argue that fragments don't, in general, have
media types, even when there are some cases where compound objects have
embedded data which *does* have a media type. The data URL scheme is
certainly one kind of counterexample, but I'm not sure I would recommend
it.

> 2.  Media types specified by HTML or XML language constructs
>
> HTML and XML provides many constructs which specifies both an URI reference
> and a media type.  The HTML and XML specifications are rather silent about
> the intended semantics.

> 1) URI
>
> One could argue that the specified media type is used when the protocol does
> not indicate the content type of the entity.  One could even argue that
> the specified media type always override the content type indicated by the
> protocol.  (Note: Many implementations fail to indicate media types
> correctly.)
>
> One could also argue that the specified media type is used to predict or
> restrict the content type of the desired entity.  That is, if an A
> link contains a 'type' attribute and the resulting URI returns an
> entity with a different content-type, then an error has occurred.

> This isn't so different as getting a '404 not found'.  Something
> happened which wasn't expected. There are various ways of recovering,
> but any attempt to override one piece of MIME data with something
>  that's "fresher" and more authoritative seems wrong.

I think I sent something like this earlier, but it came out wrong.
The problem when you have conflicting sources of information ("what
is the MIME type of this data") that you *do* want to select the
one that is fresher and more authoritative, but that there are cases
where the data associated with the URI itself is likely to be more
authoritative (e.g., with some FTP sources) and other cases where
the data associated with the entity is more authoritative (e.g., with
a recently maintained HTTP server.)

This is a design issue with HTML and (I suppose) XLink. Perhaps
there needs to be more than one way of associating a content-type
with a URI, one of which says "override content-type" and another
of which says "default content-type".

Note that even MIME-compliant protocols that normally associate
content-type with data can disclaim responsibility for it by,
say, using content-type: application/octet-stream. We might have
some theory of overriding, where text/xml would override
application/octet-stream and text/html would override text/xml
(specific overrides generic).

> 2) URI reference
>
> If a construct in HTML or XML specifies a URI reference containing a fragment
> identifier, the construct also specifies a media type, and the protocol
> indicates the content type of the entity, what will happen?
>
> One could argue that the specified media type is used for the desired
> fragment, unless the fragment contains some other way of specifying media
types.

I don't understand this case ('the fragment contains some other way...')

> One could also argue that the fragment must indicate the media type and that
> it must coincide with the media type specified by the HTML or XML
> construct (fragment).

This is unreasonable, since it isn't compatible with normal usage
where fragments are used without explicit media type.
> If the fragment does not explicitly specify the same media type, an error has
> occurred.

Perhaps you mean "if the fragment explicity specifies a media type but
it isn't the same", since not specifying a media type shouldn't be an error.

I believe that there is a generic form of a "fragment" which is
"an uninterpreted name", and that we should expect that most media
types that allow fragments have a way of looking up named components.
This would correspond to <A NAME=..> names in HTML and IDs in XML.
We should expect that other media types that have fragments will
also define named components, too.

I'd like this definition of fragment identifiers to be more specific
about encoding, though; currently it is common practice to use
spaces in fragment identifiers, for example, rather than %20 encoding
them.


> [1] HTML 4.0
>
> (http://www.w3.org/TR/html40/types.html#h-6.7)
>
> > 6.7 Content types (MIME types)
> >
> > Note. A "media type" (defined in [RFC2045] and [RFC2046]) specifies
> > the nature of a linked resource. This specification employs the term
> > "content type" rather than "media type" in accordance with current
> > usage. Furthermore, in this specification, "media type" may refer to
> > the media where a user agent renders a document.
> >
> > This type is represented in the DTD by %ContentType;.
> >
> > Content types are case-insensitive.
> >
> > Examples of content types include "text/html", "image/png",
> > "image/gif", "video/mpeg", "audio/basic", "text/tcl",
> > "text/javascript", and "text/vbscript". For the current list of
> > registered MIME types, please consult [MIMETYPES].
> > Note. The content type "text/css", while not currently registered with
> > IANA, should be used when the linked resource is a [CSS1] style sheet.
> http://www.w3.org/TR/REC-html40/present/styles.html#h-14.2.3

I've been trying to deal with issues around this paragraph in the
HTML working group; certainly, since "text/css" has been registered,
this paragraph should change.