[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some text that may be useful for the update of RFC 2376

In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...

 >> Unfortunately, we do not have "fairly good state of encoding declaration
 >> of XML files".  People generate XML documents by XSLT or their own programs,
 >> and fail to specify the correct charset.
 >That is not a problem. Such files will not be well formed and thus, will
 >fail toparse.

You are saying that the omission of the charset parameter is a problem, 
and that incorrect encoding PIs are not problems.  I do not know why.  

Many Japanese users have failed to specify correct encoding PIs, and many   
Japanese programmers have failed to generate them correctly.  I also heard 
that users in developing countries copy ISO-8859-1 HTML files and mistakenly 
put incorrect meta tags.  The same thing will happen to XML.  In-band encoding 
is not free from errors.

 >> I think that you are not paying attention to other textual format. 
 >Oh I am, but not on this list where it is off topic.

In RFC 2318 (text/css) you co-authored, the charset parameter is described 
as below:

       The syntax of CSS is expressed in US-ASCII, but a CSS file can
       contain strings which may use any Unicode character. Any charset
       that is a superset of US-ASCII may be used; US-ASCII, iso-8859-X
       and utf-8 are recommended.

RFC 2616 (HTTP) is a draft standard and defined the default as below:

   The "charset" parameter is used with some media types to define the
   character set (section 3.4) of the data. When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   type are defined to have a default charset value of "ISO-8859-1" when
   received via HTTP. Data in character sets other than "ISO-8859-1" or
   its subsets MUST be labeled with an appropriate charset value. See
   section 3.4.1 for compatibility problems.

Thus, the default value of the charset parameter of text/css is 
ISO-8859-1.  I know that CSS recommendations are different.   But 
in the realm of IETF, the default value is ISO-8859-1.

 >> I would
 >> like XML to be a good citizen of the WWW and to establish a good practise
 >As would I. I don't consider the propogation of known faults to be "good

Sorry, but I have to trust W3C I18N WG, etc. 

 >>  The charset parameter
 >> is not a historical requirement.  Rather, it is the right solution,
 >> which is just about to take off.  I think that we are wasting our
 >> limited resources by repeating old discussion rather than doing more
 >> implemenations.
 >You consistently fail to address the issue of file system processing of
 >XML, and instead characterise all opposition to your proposal as "time
 >wasting". I will be happy to characterise it as that once you have given a
 >satisfactory response to the questions I pose.

The long-term goal is to make file systems of operating systems to 
provide the charset parameter.  Encoding declarations are tentative 

I am not insisting on my proposal.  I am insisting on the rough 
consensus achived in the past.  Since the I18N WG asked the XML Syntax WG 
not to change the precedence of the charset parameter, I am extremely 
reluctant to do such changes.  Up to now, the only change I can support 
is to mandate the charset parameter of text/xml.

 >> Since XML processors support UTF-8 and UTF-16, transcoding from Unicode to
 >> legacy encodings does not look very attractive. 
 >I agree that such transcoding is unattractive, but you seem to want to bias
 >the XML MIME specification to supporting such transcoding whatever the cost
 >to other sorts of processing.

The only "other sorts of processing" I can imagine is to provide the charset 
parameter.  I understand that it is not very easy at present, but WWW servers 
are getting better.  You think that the cost of developing and using XML-aware 
transcoders and the cost of inventing different in-band encoding for 
different textual formats is not a big deal.  I do not agree.

 >However, something that converts an XML file from 8859-1 to UTF-8 and
 >leaves the endoding declaration saying 8859-1 is not useful. It has not
 >generated XML. It has made a thing which will fail to parse.

Since the charset parameter is now authoritative, such documents will parse.

MURATA Makoto  muraw3c@xxxxxxxxxxxxx