[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some text that may be useful for the update of RFC 2376



In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...
 >

 >> I think that such a transcoder is very helpful because it works for
 >> all textual formats and also because it is very efficient.
 >
 >No, it is not helpful, because it makes like a lot more difficult for
 >everyone else and leads to data corruption.

Apparently, Martin and I do not agree with you.

 >Incidentally I don't see an answer to my question about what such an
 >XML-unaweare transcoder would do when converting down from UTF-8 or UTF-16
 >to some 8-bit charset withall the unrepresentable characters. Since it
 >doesn't know XML itcan't use NCRS. What does it do, silently replace these
 >characters with question marks? And that is somehow OK? 

In the case of XML, conversion from Unicode to legacy encodings is not very 
useful.  Even when such conversion is requested, transcoders can give up 
transcoding, when they encounter something unrepresentable.


 >>  >> The charset parameter is such a solution.
 >>  >
 >>  >It is one such solution. There are better ones, and indeed a much better
 >>  >one in the XML specification.
 >> 
 >> It works only for XML.  It is not bad, when the MIME header is not available.
 >> But when it is available, we must rely on the charset parameter.
 >
 >For text/*, yes, we have to. Luckily there is application/* and model/* and
 >image/* and so forth for people using XML who care about data integrity and
 >don't want cheap text processing tools playing fast and loose with their
 >data.

I believe that "always the charset parameter" is the recommendation shown 
in RFC 2130 and the public page of W3C I18N WG.

In my message "History of the charset issue", I tried to summarize 
my understanding of the history.  I am unable to ignore the consensus 
of W3C I18N WG, W3C XML Syntax WG, and W3C XML SIG&WG, and the recommendation 
shown in RFC 2130.

 >> You are advocating different in-band encoding signatures for different
 >> formats.  I think that this is a significant burden to users and speficiation
 >> developers.
 >
 >You are advocating different out-of-band or in-band or mixed signatures for
 >different protocols.

The long-term goal is to make file systems of OS aware of the charset 
parameter.  Editors know the charset, and they store the charset information 
in the file system.  This info is then passed to WWW servers and further passed 
to WWW browsers.  The charset info is completely hidden from users and 
everything is automatic.  There will be no data corruption.

As of today, we need in-band signature and some tricks to keep out-of-band 
signature and in-band signature consistent.

Most modern WWW servers provide the charset parameter.  We only have to 
encourage them without repeading old arguments.

 > A solution that requires every "save as" of an XML
 >file to rewrite the (incorrect, but overridded by a MIME charset parameter)
 >encoding declaration, which was only incorrect because one of your "I know
 >how to fiddle with all text files" transcoders silently broke it in the
 >first place. This places, as you say, an intolerable burden on users.

You are confusing XML-unaware transcoders and XML-aware programs which 
save XML documents into files.

 >One of the things about XML, which differs from HTML, is typical patterns
 >of use. XML treansmitted over HTTP ius likely to be extensively manipulated
 >from the filesystem of both the server and the client, a common operation
 >which your proposal makes much more difficult, just to allow people who
 >write simple text processing tools to not add XML support. As a trade off,
 >i hope it is obvious to everyone else why this is such a bad idea.

I agree on the first and second sentence and completely disagree with 
the last sentence.


----
MURATA Makoto  muraw3c@xxxxxxxxxxxxx