[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some text that may be useful for the update of RFC 2376




MURATA Makoto wrote:
> 
> In message "Re: Some text that may be useful for the update of RFC 2376",
> Chris Lilley wrote...
>  >Yes - such behaviour is clearly broken. Since a transcoder is changing many
>  >or all the other bytes in the file, expecting it to also correctly update
>  >the encoding declaration rather than leaving it broken is not asking too
>  >much.
> 
> I think that such a transcoder is very helpful because it works for
> all textual formats and also because it is very efficient.

No, it is not helpful, because it makes like a lot more difficult for
everyone else and leads to data corruption.

Incidentally I don't see an answer to my question about what such an
XML-unaweare transcoder would do when converting down from UTF-8 or UTF-16
to some 8-bit charset withall the unrepresentable characters. Since it
doesn't know XML itcan't use NCRS. What does it do, silently replace these
characters with question marks? And that is somehow OK? 

What about if this makes the file no longer well formed - that is OK too?
And all to save half an hour of developer time in a transcoder, at the
expense of silently corrupted data and lots more work for developers of
tools and for users, to patch up the errors that such tools introduce. This
is a really bad idea!

>  >> The charset parameter is such a solution.
>  >
>  >It is one such solution. There are better ones, and indeed a much better
>  >one in the XML specification.
> 
> It works only for XML.  It is not bad, when the MIME header is not available.
> But when it is available, we must rely on the charset parameter.

For text/*, yes, we have to. Luckily there is application/* and model/* and
image/* and so forth for people using XML who care about data integrity and
don't want cheap text processing tools playing fast and loose with their
data.


>  >> We should not try to bend
>  >> specifications only to invent an ad-hoc solution for a particular format.
>  >
>  >I can only agree with that sentence by replacing "format" with "protocol".
> 
> You are advocating different in-band encoding signatures for different
> formats.  I think that this is a significant burden to users and speficiation
> developers.

You are advocating different out-of-band or in-band or mixed signatures for
different protocols. A solution that requires every "save as" of an XML
file to rewrite the (incorrect, but overridded by a MIME charset parameter)
encoding declaration, which was only incorrect because one of your "I know
how to fiddle with all text files" transcoders silently broke it in the
first place. This places, as you say, an intolerable burden on users.

One of the things about XML, which differs from HTML, is typical patterns
of use. XML treansmitted over HTTP ius likely to be extensively manipulated
from the filesystem of both the server and the client, a common operation
which your proposal makes much more difficult, just to allow people who
write simple text processing tools to not add XML support. As a trade off,
i hope it is obvious to everyone else why this is such a bad idea.

--
Chris