[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Fwd: Text/xml vs application/xml
In the W3C XML SIG, Kurt Conrad and I wrote this summary for the
discussion of XML media types.
Kurt Conrad wrote...
Proposal:
This RFC will introduce both text/xml and application/xml.
Text/xml is recommended for entities that would be meaningful
to a human being without XML processing. (Thus, text/xml is
always appropriate for external DTD subsets and external
parameter entities.) Application/xml is recommended for all
others.
Transmission of XML documents encoded in UTF-16 or UCS-2 via
the SMTP protocol is a special case. For this purpose, we
cannot use text/xml, because of the line termination rule of
MIME. Application/xml is recommended, instead. (Note that
the XML PR needs slight revision, if this proposed decision
is accepted.)
Criteria:
RFC 2046 provides the definition of top-level media
types "text" and "application". The definition of
"text" is as below:
>3. Overview Of The Initial Top-Level Media Types [RFC 2046]
> The five discrete top-level media types are:
> (1) text -- textual information. The subtype "plain" in
> particular indicates plain text containing no
> formatting commands or directives of any sort. Plain
> text is intended to be displayed "as-is". No special
> software is required to get the full meaning of the
> text, aside from support for the indicated character
> set. Other subtypes are to be used for enriched text in
> forms where application software may enhance the
> appearance of the text, but such software must not be
> required in order to get the general idea of the
> content. Possible subtypes of "text" thus include any
> word processor format that can be read without
> resorting to software that understands the format. In
> particular, formats that employ embeddded binary
> formatting information are not considered directly
> readable. A very simple and portable subtype,
> "richtext", was defined in RFC 1341, with a further
> revision in RFC 1896 under the name "enriched".
[snip]
>4.1. Text Media Type [RFC 2046]
>
> The "text" media type is intended for sending material which is
> principally textual in form. A "charset" parameter may be used to
> indicate the character set of the body text for "text" subtypes,
> notably including the subtype "text/plain", which is a generic
> subtype for plain text. Plain text does not provide for or allow
> formatting commands, font attribute specifications, processing
> instructions, interpretation directives, or content markup. Plain
> text is seen simply as a linear sequence of characters, possibly
> interrupted by line breaks or page breaks. Plain text may allow the
> stacking of several characters in the same position in the text.
> Plain text in scripts like Arabic and Hebrew may also include
> facilitites that allow the arbitrary mixing of text segments with
> opposite writing directions.
>
> Beyond plain text, there are many formats for representing what might
> be known as "rich text". An interesting characteristic of many such
> representations is that they are to some extent readable even without
> the software that interprets them. It is useful, then, to
> distinguish them, at the highest level, from such unreadable data as
> images, audio, or text represented in an unreadable form. In the
> absence of appropriate interpretation software, it is reasonable to
> show subtypes of "text" to the user, while it is not reasonable to do
> so with most nontextual data. Such formatted textual data should be
> represented using subtypes of "text".
>
>4.1.1. Representation of Line Breaks [RFC 2046]
>
>[snip]
It is quite clear that most XML documents belong to the
"text" type.
Meanwhile, the top-level type "application" is defined as
below:
>3. Overview Of The Initial Top-Level Media Types [RFC 2046]
snip
>(5) application -- some other kind of data, typically
> either uninterpreted binary data or information to be
> processed by an application. The subtype "octet-
> stream" is to be used in the case of uninterpreted
> binary data, in which case the simplest recommended
> action is to offer to write the information into a file
> for the user. The "PostScript" subtype is also defined
> for the transport of PostScript material. Other
> expected uses for "application" include spreadsheets,
> data for mail-based scheduling systems, and languages
> for "active" (computational) messaging, and word
> processing formats that are not directly readable.
> Note that security considerations may exist for some
> types of application data, most notably
> "application/PostScript" and any form of active
> messaging. These issues are discussed later in this
> document.
[snip]
>4.5. Application Media Type [RFC 2046]
> The "application" media type is to be used for discrete data which do
> not fit in any of the other categories, and particularly for data to
> be processed by some type of application program. This is
> information which must be processed by an application before it is
> viewable or usable by a user. Expected uses for the "application"
> media type include file transfer, spreadsheets, data for mail-based
> scheduling systems, and languages for "active" (computational)
> material. (The latter, in particular, can pose security problems
> which must be understood by implementors, and are considered in
> detail in the discussion of the "application/PostScript" media type.)
> For example, a meeting scheduler might define a standard
> representation for information about proposed meeting dates. An
> intelligent user agent would use this information to conduct a dialog
> with the user, and might then send additional material based on that
> dialog. More generally, there have been several "active" messaging
> languages developed in which programs in a suitably specialized
> language are transported to a remote location and automatically run
> in the recipient's environment.
> Such applications may be defined as subtypes of the "application"
> media type. This document defines two subtypes:
> octet-stream, and PostScript.
> The subtype of "application" will often be either the name or include
> part of the name of the application for which the data are intended.
> This does not mean, however, that any application program name may be
> used freely as a subtype of "application".
Probably, some XML data belong to this class. This is
one reason to introduce application/xml.
Another reason for application/xml is the delivery of XML
documents in UTF-16 by the SMTP protocol. RFC 2046
has a very strict rule for line termination, which makes
it impossible to use UTF-16. Although HTTP loosens
this rule, the SMTP protocol does not. Thus, the
only choice is application/xml.
References:
RFC 1896
http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1896.txt
RFC 1341
http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1341.txt
RFC 2046
http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2046.txt
----
MURATA Makoto muraw3c@xxxxxxxxxxxxx