On Thu, 24 Jun 2004 08:06:06 +0900, MURATA Makoto (FAMILY Given)
<eb2m-mrt@xxxxxxxxxxxxxxx> wrote:
[quotes from relevant specifications snipped]
Thanks, that's very helpful.
So, I think that MIME entities labelled as text/plain are not well-formed
XML documents.
OK, then my sample breaks down like this:
5096 total feeds
3241 (63.60%) are well-formed
917 (17.99%) are not well-formed due to non-XML media type (e.g. text/plain)
798 (15.66%) are not well-formed due to text/xml encoding mismatch
(declared as text/xml but contains characters outside us-ascii)
25 (0.49%) are not well-formed due to other encoding mismatch
(declared as some encoding but contains characters outside that
encoding)
115 (2.26%) are not well-formed for other reasons (e.g. malformed markup)