[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Well-formedness statistics
On Thu, 24 Jun 2004 08:06:06 +0900, MURATA Makoto (FAMILY Given)
<eb2m-mrt@xxxxxxxxxxxxxxx> wrote:
> [quotes from relevant specifications snipped]
Thanks, that's very helpful.
> So, I think that MIME entities labelled as text/plain are not well-formed
> XML documents.
OK, then my sample breaks down like this:
5096 total feeds
3241 (63.60%) are well-formed
917 (17.99%) are not well-formed due to non-XML media type (e.g. text/plain)
798 (15.66%) are not well-formed due to text/xml encoding mismatch
(declared as text/xml but contains characters outside us-ascii)
25 (0.49%) are not well-formed due to other encoding mismatch
(declared as some encoding but contains characters outside that
encoding)
115 (2.26%) are not well-formed for other reasons (e.g. malformed markup)
Note that the "other reasons" category is now significantly smaller.
Some of the feeds previously counted in that category were also served
as text/plain, so they are now counted in the new "non-XML media type"
category. Each feed is counted only once. Amazingly, percentages add
up to 100%, despite rounding.
--
Cheers,
-Mark