[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Some text that may be useful for the update of RFC 2376
In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...
>> Unfortunately, we do not have "fairly good state of encoding declaration
>> of XML files". People generate XML documents by XSLT or their own programs,
>> and fail to specify the correct charset.
>That is not a problem. Such files will not be well formed and thus, will
You are saying that the omission of the charset parameter is a problem,
and that incorrect encoding PIs are not problems. I do not know why.
Many Japanese users have failed to specify correct encoding PIs, and many
Japanese programmers have failed to generate them correctly. I also heard
that users in developing countries copy ISO-8859-1 HTML files and mistakenly
put incorrect meta tags. The same thing will happen to XML. In-band encoding
is not free from errors.
>> I think that you are not paying attention to other textual format.
>Oh I am, but not on this list where it is off topic.
In RFC 2318 (text/css) you co-authored, the charset parameter is described
The syntax of CSS is expressed in US-ASCII, but a CSS file can
contain strings which may use any Unicode character. Any charset
that is a superset of US-ASCII may be used; US-ASCII, iso-8859-X
and utf-8 are recommended.
RFC 2616 (HTTP) is a draft standard and defined the default as below:
The "charset" parameter is used with some media types to define the
character set (section 3.4) of the data. When no explicit charset
parameter is provided by the sender, media subtypes of the "text"
type are defined to have a default charset value of "ISO-8859-1" when
received via HTTP. Data in character sets other than "ISO-8859-1" or
its subsets MUST be labeled with an appropriate charset value. See
section 3.4.1 for compatibility problems.
Thus, the default value of the charset parameter of text/css is
ISO-8859-1. I know that CSS recommendations are different. But
in the realm of IETF, the default value is ISO-8859-1.
>> I would
>> like XML to be a good citizen of the WWW and to establish a good practise
>As would I. I don't consider the propogation of known faults to be "good
Sorry, but I have to trust W3C I18N WG, etc.
>> The charset parameter
>> is not a historical requirement. Rather, it is the right solution,
>> which is just about to take off. I think that we are wasting our
>> limited resources by repeating old discussion rather than doing more
>You consistently fail to address the issue of file system processing of
>XML, and instead characterise all opposition to your proposal as "time
>wasting". I will be happy to characterise it as that once you have given a
>satisfactory response to the questions I pose.
The long-term goal is to make file systems of operating systems to
provide the charset parameter. Encoding declarations are tentative
I am not insisting on my proposal. I am insisting on the rough
consensus achived in the past. Since the I18N WG asked the XML Syntax WG
not to change the precedence of the charset parameter, I am extremely
reluctant to do such changes. Up to now, the only change I can support
is to mandate the charset parameter of text/xml.
>> Since XML processors support UTF-8 and UTF-16, transcoding from Unicode to
>> legacy encodings does not look very attractive.
>I agree that such transcoding is unattractive, but you seem to want to bias
>the XML MIME specification to supporting such transcoding whatever the cost
>to other sorts of processing.
The only "other sorts of processing" I can imagine is to provide the charset
parameter. I understand that it is not very easy at present, but WWW servers
are getting better. You think that the cost of developing and using XML-aware
transcoders and the cost of inventing different in-band encoding for
different textual formats is not a big deal. I do not agree.
>However, something that converts an XML file from 8859-1 to UTF-8 and
>leaves the endoding declaration saying 8859-1 is not useful. It has not
>generated XML. It has made a thing which will fail to parse.
Since the charset parameter is now authoritative, such documents will parse.
MURATA Makoto muraw3c@xxxxxxxxxxxxx