[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Can/Does/Should the FeedValidator catch improperly escaped XHTML?




M. David Peterson wrote:
via http://www.oreillynet.com/xml/blog/2006/08/and_the_winner_of_the_best_ind_1.html#comment-75533

    There's one other small problem though: they put XHTML as CDATA in
    "html" text constructs, while they're supposed to contain HTML 4.
    And since it's XHTML, they should embed it directly in "xhtml"
    constructs...

Anthony brings out a good point > http://www.oreillynet.com/xml/blog/2006/08/and_the_winner_of_the_best_ind_1.html#comment-75822 <,

    Odd that the validator isn't saying anything about this.

Should it, or is this an edge case that can be difficult, at best, to catch?

At the moment, the HTML content is passed through the following:

http://docs.python.org/lib/module-HTMLParser.html

Note that this parser includes a handle_startendtag method, which is not a part of the HTML standard. Given the rather loose nature of HTML, this only tends to catch things like unmatched angle brackets and quotes.

Also, there are a number of tools that attempt to produce well-formed XHTML, but don't do so consistently enough to drop the content into an Atom feed in such a manner.

- Sam Ruby