[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Well-formedness statistics



Mark,

Thank you very much for providing very userful information!

Let me ask some questions.  When documents are not well-formed 
because they contain non-US-ASCII characters but do not have the 
charset parameter, what is their encoding declaration and 
what is the actual encoding?  I can imagine that there are 
at least six patterns:

1) encoded in iso-8859-1 and encoding="iso-8859-1" and 
2) encoded in iso-8859-1 and either no encoding dcl or encoding="utf-8"
3) encoded in utf-8 and encoding="iso-8859-1"
4) encoded in utf-8 and either no encoding dcl or encoding="utf-8"

where 1) and 4) would be well-formed if this was labelled as 
application/xml but 2) and 3) never become well-formed.

Cheers,

Makoto

-- 
MURATA Makoto (FAMILY Given) <EB2M-MRT@xxxxxxxxxxxxxxx>