[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Comments on mime-respect
These are my comments on http://www.w3.org/2001/tag/doc/mime-respect.html,
various issues mixed a bit, sorry.
[I have cross-posted ietf-xml-mime@xxxxxxx because some of them are relevant
to the recent discussion about the charset paramenter on Content-Type.]
- Headings: Is this a completed finding, or a draft finding?
- "HTTP/1.1 a a response": word duplication
- Overall, it seems difficult to identify what is general architecture,
and what is the way it is just because it is the way it (mostly) is.
- My understanding is that one origin of the 'charset' parameter was
that it was useful to invoke different applications for different
values. That was definitely the case 10 years or so ago when MIME
was designed. I remember reading my email that way. This has gone away.
It may happen that in a somewhat similar way, a lot of what we now
see as different XML types, in need of different applications, may
go away in a few years.
- Section 4: "The Unicode encoding of a message body (XML document) is
inconsistent with the value of the charset parameter in the message
- Please replace 'Unicode encoding' with 'character encoding'.
It would be strange to e.g. call iso-8859-1 an 'Unicode encoding'.
- Please remove, or reword "XML document", to not give the impression
that message bodies are always XML documents.
- I'm not clear why this is in section 4, entitled "Why user agent
behavior that misrepresents the user is harmful". This is a
server problem, the user is not in any way misrepresented.
- The big problem with wrong encoding information for XML and other
documents is not in a server-user context (where the user has
to be able to read the document, such problems are usually
discovered very quickly), but with XML sent between machines.
This probably should be noted.
- The structure of sections 3 and 4 should be improved. It is good
style to have an introductory paragraph or two before subsection.
It is confusing to have a few paragraphs in the first subsection
of the section after a lot of text that is not in subsections.
- "For this reason, servers should only supply a character encoding
header when there is complete certainty as to the encoding in use.
Otherwise, an error will cause a perfectly usable representation
to be rejected by an architecturally sound client."
Why doesn't the document say e.g. that a mime type should only be
supplied when there is complete certainty that this type is
appropriate? Why does this text assume that the XML is 'perfectly
usable'? It might not be valid, it might be the wrong mime type,
or it might not have the right 'encoding' attribute.
- "Servers which generate representations MUST NOT generate the charset
parameter unless there is certainty that the headers are correct.
When correct, this information can be used by non-XML processors
to determine authoritatively the character encoding of the XML MIME
How is a server ever going to know, or going to be able to check,
what the right character encoding is? Making this a requirement
on the server itself seems inadequate.
- Section 5: "For instance, the http-equiv attribute of the HTML meta
element is intended for servers (not clients)."
Please change 'is' to 'was'. In particular with respect to character
encoding, current practice is that it's used on the client. If you
think that this should change, you should say so.
- SMIL 2.0 is "outmoded": I would prefer a different word here.
I strongly agree that what SMIL 2.0 is saying on content types
is a very bad idea, and I have said so to the SMIL WG (and more
recently the Voice browser WG, I think). But given the 2001
date, I don't think 'outmoded' is the right word, because it was
never in fashion in the first place.
- Section 6: There is advice to server managers and authors. But
I think we need to go one more step back, to server implementers
and the default settings when servers are shipped.
For example, some servers have an easy way to explore configurations
and check settings. Others don't. Some servers come with default
configurations that may be suboptimal. For example (not picking on
it, just because that's the one I know), Apache at
says: "AddDefaultCharset On enables Apache's internal default charset
of iso-8859-1 as required by the directive."
Also, the default configuration file contains this:
# Specify a default charset for all pages sent out. This is
# always a good idea and opens the door for future internationalisation
# of your web site, should you ever want it. Specifying it as
# a default does little harm; as the standard dictates that a page
# is in iso-8859-1 (latin1) unless specified otherwise i.e. you
# are merely stating the obvious. There are also some security
# which encourage you to always set a default char set.
This seems to be 180 degrees opposite to what the TAG is saying.
It is more about text/html,... than about application/...+xml, but
there is considerable potential for harm here, too, in particular
when combined with the default setting that Apache comes with that
does not allow people managing a directory to override file info.