[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some text that may be useful for the update of RFC 2376

MURATA Makoto wrote:
> In message "Re: Some text that may be useful for the update of RFC 2376",
> Martin J. Duerst wrote...
>  >CSS is served as text/css. XSL is XML. VBScript and JavaScript may be
>  >served as application/... If they don't have a 'charset' parameter,
>  >and they don't have any internal way to indicate the encoding,
>  >that's the problem of these registrations, not our problem.
> Are you saying that each format should invent their own rules for
> indicating the charset?  My understanding was (and still is) that
> you as an I18n guy at W3C are promoting a single generalized solution
> for all textual formats.

Are you saying that each transport protocol (which formally inclues direct
filesystem access) should have their own, sometimes contradictory,
overrides and defaults and assumptions? Or that we should take the current,
lowest-common-denominator, fails far more often than it works charset
parameter of two particular protocols (each of which has a different
default, ands neither of which is implemented consistently) and attempt to
stretch this to make loose and wooly the current, fairly good state of
encoding declaration of XML files?

There is an English colloquial expression: "throwing out the baby with the

Several people have pointed out that I am focussing on XML here. I would
refer them to the name and scope of the mailing list.

Incidentally, XML is probably not best described as a textual format. It is
a data format, which can among other things be used to describe
international text. I am aware that the text/* media types have some
historical requirements regarding 'character set'; this is sufficient that
my opinion is that text/* should not be used for XML in general.
Application/xml has no such problems (though it seems that people propose
to propogate these problems there).

Several people have described dumb, content-unaware charset transoders as
the most important thing that they are concerned with, and expressed
surprise that such converters should be required to know something about
the payload whose bytes they are merrily altering. In reply, I say - of
course. Otherwise, data corruption will clearly occur.

It is possible for example to take a payload of image/svg-xml and alter it
from UTF-16 to ISO-8859-15 (this would entail rewriting the encoding
declaration and insertion of NCRs for any characters outside the repertoire
of 8859-15). I would be most upset, as would every decoder on the planet,
if the same conversion was performed on image/png.

But that is, after all, the point of the -xml suffix? To flag that the
content is encoded in XML, so thatany processor which feels like fiddling
with the bytes therin can judge whether it is competent to do so?