[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some text that may be useful for the update of RFC 2376

In message "Re: Some text that may be useful for the update of RFC 2376",
Rick Jelliffe wrote...

 >> HTTP already has the accept-charset field.  I do not understand your claim.
 >If I am using a DOM parser, I cannot ask it "what encodings do you
 >support?" If I am using SAX in Java, I can assume that the encodings
 >underlying Java are the ones available. I don't recall any C or C++ XML
 >parser that exposes this information: I don't think Expat does, for

Although such information is described in their manuals, I do not 
think that they have any APIs for "what encodings do you
support?".  I agree.

 >If my browser cannot ask its XML processor "what encodings do you
 >support?" in order to perform content negotiation for XML, then either the
 >poor user must configure it themselves or the HTTP code has to take on the
 >responsibility for providing transcoding services itself (perhaps not a
 >bad thing for the future).  And configuration has to be done
 >application-by-application: for example James Clark's vanilla Expat did
 >not accept Big5, so every XML application built on it was pretty unusable
 >for Traditional Chinese here.  

Ideally, XML processors should silently provide the accept-charset field.  

On top of document entities, an XML processor may silently fetch external 
parsed entities, external parameter entities, and external DTD subsets.  
(I know expat doesn't, but other parsers do.)  Since application 
programmers cannot control such fetching, the best solution is hardcode 
the accept-charset field in the XML processor.  Certainly, the person who 
writes the XML processor knows which encoding is supported.  (It would be 
great if we can register callback routines for unsupported charsets.)

 >Even if you are using DOM (or
 >SAX) it is quite possible that the system integrator has chosen to use a
 >different implementation from the one which the software developer used.
 >So you cannot ask DOM, and the programmer cannot be sure which
 >implementation is being used. 

Right.  But we can always assume that the same UCS characters will be 

 >So it seems to me that content negotiation of character encoding for XML
 >is a bit of a myth: it requires that the user test applications rather
 >than it being transparent. 

If content negotiation is hardcoded in XML processors, application programmers 
do not have to worry.

>That is an unreasonable and unworkable
 >requirement. At the moment, the browser has to guess which encodings are
 >available, or the poor user has to test if the local encoding is
 >supported. (I suppose systems could also have some automated system which
 >requested a big5 XML file and then tried to parse it. Not really
 >elegant. Presumably the XML file would have to be sourced internally. )

I am afraid that I do not fully understand your claim.  Could you 
try again?

 >For content negotiation of MIME types, a browser knows which content-types
 >have handlers. But it doesn't know this information for character-encoding
 >for the XML applications it has. That is why I 

Probably, you sent this mail before you finished the last sentence.  

MURATA Makoto  muraw3c@xxxxxxxxxxxxx