[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: Text/xml vs application/xml



In the W3C XML SIG, Kurt Conrad and I wrote this summary for the 
discussion of XML media types.


Kurt Conrad wrote...
Proposal:

This RFC will introduce both text/xml and application/xml.
Text/xml is recommended for entities that would be meaningful
to a human being without XML processing.  (Thus, text/xml is
always appropriate for external DTD subsets and external
parameter entities.)  Application/xml is recommended for all
others.

Transmission of XML documents encoded in UTF-16 or UCS-2 via
the SMTP protocol is a special case.  For this purpose, we
cannot use text/xml, because of the line termination rule of
MIME.  Application/xml is recommended, instead.  (Note that
the XML PR needs slight revision, if this proposed decision
is accepted.)


Criteria:

RFC 2046 provides the definition of top-level media 
types "text" and "application".  The definition of 
"text" is as below:

>3.  Overview Of The Initial Top-Level Media Types [RFC 2046]
>   The five discrete top-level media types are:
>    (1)   text -- textual information.  The subtype "plain" in
>          particular indicates plain text containing no
>          formatting commands or directives of any sort. Plain
>          text is intended to be displayed "as-is". No special
>          software is required to get the full meaning of the
>          text, aside from support for the indicated character
>          set. Other subtypes are to be used for enriched text in
>          forms where application software may enhance the
>          appearance of the text, but such software must not be
>          required in order to get the general idea of the
>          content.  Possible subtypes of "text" thus include any
>          word processor format that can be read without
>          resorting to software that understands the format.  In
>          particular, formats that employ embeddded binary
>          formatting information are not considered directly
>          readable. A very simple and portable subtype,
>          "richtext", was defined in RFC 1341, with a further
>          revision in RFC 1896 under the name "enriched".

[snip]
>4.1.  Text Media Type [RFC 2046]
>
>   The "text" media type is intended for sending material which is
>   principally textual in form.  A "charset" parameter may be used to
>   indicate the character set of the body text for "text" subtypes,
>   notably including the subtype "text/plain", which is a generic
>   subtype for plain text.  Plain text does not provide for or allow
>   formatting commands, font attribute specifications, processing
>   instructions, interpretation directives, or content markup.  Plain
>   text is seen simply as a linear sequence of characters, possibly
>   interrupted by line breaks or page breaks.  Plain text may allow the
>   stacking of several characters in the same position in the text.
>   Plain text in scripts like Arabic and Hebrew may also include
>   facilitites that allow the arbitrary mixing of text segments with
>   opposite writing directions.
>
>   Beyond plain text, there are many formats for representing what might
>   be known as "rich text".  An interesting characteristic of many such
>   representations is that they are to some extent readable even without
>   the software that interprets them.  It is useful, then, to
>   distinguish them, at the highest level, from such unreadable data as
>   images, audio, or text represented in an unreadable form. In the
>   absence of appropriate interpretation software, it is reasonable to
>   show subtypes of "text" to the user, while it is not reasonable to do
>   so with most nontextual data. Such formatted textual data should be
>   represented using subtypes of "text".
>
>4.1.1.  Representation of Line Breaks [RFC 2046]
>
>[snip]


It is quite clear that most XML documents belong to the 
"text" type.

Meanwhile, the top-level type "application" is defined as
below:

>3.  Overview Of The Initial Top-Level Media Types [RFC 2046]
snip
>(5)   application -- some other kind of data, typically
>          either uninterpreted binary data or information to be
>          processed by an application.  The subtype "octet-
>          stream" is to be used in the case of uninterpreted
>          binary data, in which case the simplest recommended
>          action is to offer to write the information into a file
>          for the user.  The "PostScript" subtype is also defined
>          for the transport of PostScript material.  Other
>          expected uses for "application" include spreadsheets,
>          data for mail-based scheduling systems, and languages
>          for "active" (computational) messaging, and word
>          processing formats that are not directly readable.
>          Note that security considerations may exist for some
>          types of application data, most notably
>          "application/PostScript" and any form of active
>          messaging.  These issues are discussed later in this
>          document.
[snip]
>4.5.  Application Media Type [RFC 2046]
>   The "application" media type is to be used for discrete data which do
>   not fit in any of the other categories, and particularly for data to
>   be processed by some type of application program.  This is
>   information which must be processed by an application before it is
>   viewable or usable by a user.  Expected uses for the "application"
>   media type include file transfer, spreadsheets, data for mail-based
>   scheduling systems, and languages for "active" (computational)
>   material.  (The latter, in particular, can pose security problems
>   which must be understood by implementors, and are considered in
>   detail in the discussion of the "application/PostScript" media type.)
>   For example, a meeting scheduler might define a standard
>   representation for information about proposed meeting dates.  An
>   intelligent user agent would use this information to conduct a dialog
>   with the user, and might then send additional material based on that
>   dialog.  More generally, there have been several "active" messaging
>   languages developed in which programs in a suitably specialized
>   language are transported to a remote location and automatically run
>   in the recipient's environment.
>   Such applications may be defined as subtypes of the "application"
>   media type. This document defines two subtypes:
>   octet-stream, and PostScript.
>   The subtype of "application" will often be either the name or include
>   part of the name of the application for which the data are intended.
>   This does not mean, however, that any application program name may be
>   used freely as a subtype of "application".


Probably, some XML data belong to this class.  This is 
one reason to introduce application/xml.

Another reason for application/xml is the delivery of XML 
documents in UTF-16 by the SMTP protocol.  RFC 2046 
has a very strict rule for line termination, which makes 
it impossible to use UTF-16.  Although HTTP loosens 
this rule, the SMTP protocol does not.  Thus, the 
only choice is application/xml.


References:

RFC 1896
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1896.txt

RFC 1341
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1341.txt

RFC 2046
   http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2046.txt



----
MURATA Makoto  muraw3c@xxxxxxxxxxxxx