[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Requiring charset parameter on +xml types?



Hello Mark,

As far as I understand, for text/...+xml, the charset parameter
is very close to being required (by using the general MIME default
of us-ascii, which overrides any internal information).
For application/...+xml (or others such as image/...+xml), the
internal information is relevant if there is no charset parameter.

To solve the problems, the following things should help:

- Help to make system administrators, webmasters, and so on,
  aware of the fact that they have to allow settings and have
  to make them.
- Maybe to define a type application/html+xml for use by those
  people who want to just ship files around.
- For text/html+xml (and that would work for all types, actually),
  beef up the servers so that they can use the information in the
  file to set the charset parameter correctly.
  [Sorry to be a bit lengthy here, but I just almost got excited
   about this idea.]
  This would work about as follows:
  - Server detects that content type of outgoing stuff is +xml.
  - Server verifies that there is no setting for the charset
    parameter in the configuration.
  - Server peeks into the data to check for a BOM and/or for an
    encoding declaration.
  - Server sets charset parameter.

  Please note the following:
  - If this feature is introduced into a server without any
    configuration options, the only potentially undesirable
    effect is to label files that were intended to go out
    as defaulted to us-ascii as utf-8 explicitly. But this
    is in no way a problem, because every us-ascii file is
    a utf-8 file (this means no wrong labels), and because
    the chance that a receiving XML processor understands
    utf-8 are higher than for us-ascii (utf-8 is required,
    us-ascii is not).
  - Such a feature has been the idea behind <meta http-equiv,
    but it turned out that the lookahead length and processing
    effort for that is much too high. For the xml encoding
    declaration, things look much better.

Regards, Martin.

At 00/10/13 13:52 -0400, Mark Baker wrote:
Greetings,

At the HTML WG f2f yesterday, it was proposed that we should require the
use of the charset parameter on the XHTML media type.  This came up in
the discussion over whether the media type should use application/ or
text/.

Would requiring this parameter help solve the problem of those broken
web servers that I've heard about that mislabel non-US-ASCII encoded
text/* content as US-ASCII?

Are the any other issues/pros/cons we should be aware of for requiring
this parameter?

Thanks.

BTW, for W3C members that are interested (i.e. it's not required reading
to grok this message - mustUnderstand = 0 :-), the IRC log of the
conversation is available at;

http://lists.w3.org/Archives/Member/w3c-html-wg/2000OctDec/0104.html

MB