[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Well-formedness statistics
Mark,
Thank you very much for providing very userful information!
Let me ask some questions. When documents are not well-formed
because they contain non-US-ASCII characters but do not have the
charset parameter, what is their encoding declaration and
what is the actual encoding? I can imagine that there are
at least six patterns:
1) encoded in iso-8859-1 and encoding="iso-8859-1" and
2) encoded in iso-8859-1 and either no encoding dcl or encoding="utf-8"
3) encoded in utf-8 and encoding="iso-8859-1"
4) encoded in utf-8 and either no encoding dcl or encoding="utf-8"
where 1) and 4) would be well-formed if this was labelled as
application/xml but 2) and 3) never become well-formed.
Cheers,
Makoto
--
MURATA Makoto (FAMILY Given) <EB2M-MRT@xxxxxxxxxxxxxxx>