[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Well-formedness statistics




Mark Pilgrim wrote:
On Thu, 24 Jun 2004 08:06:06 +0900, MURATA Makoto (FAMILY Given)
<eb2m-mrt@xxxxxxxxxxxxxxx> wrote:

[quotes from relevant specifications snipped]


Thanks, that's very helpful.


So, I think that MIME entities labelled as text/plain are not well-formed
XML documents.


OK, then my sample breaks down like this:

5096 total feeds

3241 (63.60%) are well-formed
917 (17.99%) are not well-formed due to non-XML media type (e.g. text/plain)
798 (15.66%) are not well-formed due to text/xml encoding mismatch
(declared as text/xml but contains characters outside us-ascii)
25 (0.49%) are not well-formed due to other encoding mismatch
(declared as some encoding but contains characters outside that
encoding)
115 (2.26%) are not well-formed for other reasons (e.g. malformed markup)

Of the ones with malformed markup, what % of those were from extra content at the end of the document? Not that it matters for well-formedness, just curious.

-joe

--
http://BitWorking.org
http://WellFormedWeb.org