[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Some text that may be useful for the update of RFC 2376
On Fri, 17 Mar 2000, MURATA Makoto wrote:
> Rick Jelliffe wrote...
> >We need a way to ensure end-to-end integrity.
> I do not agree. Why?
By "we" I mean my employers and I, not "we" as in ietf-xml-mime.
We need to be able to send and receive Big5-encoded XML where we have
no control over the behaviour of intermediate systems.
> >(and, in any case,
> >there is no mechanism currently for an XML parser to feed information
> >about which encodings it accepts to the HTTP system to set up the
> >preferences in the first place.)
> HTTP already has the accept-charset field. I do not understand your claim.
If I am using a DOM parser, I cannot ask it "what encodings do you
support?" If I am using SAX in Java, I can assume that the encodings
underlying Java are the ones available. I don't recall any C or C++ XML
parser that exposes this information: I don't think Expat does, for
If my browser cannot ask its XML processor "what encodings do you
support?" in order to perform content negotiation for XML, then either the
poor user must configure it themselves or the HTTP code has to take on the
responsibility for providing transcoding services itself (perhaps not a
bad thing for the future). And configuration has to be done
application-by-application: for example James Clark's vanilla Expat did
not accept Big5, so every XML application built on it was pretty unusable
for Traditional Chinese here.
Even if you are using DOM (or
SAX) it is quite possible that the system integrator has chosen to use a
different implementation from the one which the software developer used.
So you cannot ask DOM, and the programmer cannot be sure which
implementation is being used.
So it seems to me that content negotiation of character encoding for XML
is a bit of a myth: it requires that the user test applications rather
than it being transparent. That is an unreasonable and unworkable
requirement. At the moment, the browser has to guess which encodings are
available, or the poor user has to test if the local encoding is
supported. (I suppose systems could also have some automated system which
requested a big5 XML file and then tried to parse it. Not really
elegant. Presumably the XML file would have to be sourced internally. )
For content negotiation of MIME types, a browser knows which content-types
have handlers. But it doesn't know this information for character-encoding
for the XML applications it has. That is why I