[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Resources for AtomPub parser validation




On Jun 30, 2008, at 9:10 AM, Daniel Jalkut wrote:

In particular, I continue to have mind-bending problems interpreting the ins and outs of how particular content needs to be treated in the Atom format that is used by AtomPub. Right now, for instance, I'm trying to learn definitively whether an escaped "&" inside a content type xhtml div should be left alone or converted to an ampersand in the parsed content output.

If it says <content type="xhtml"> then you just let the XML parser keep running and take what it gives you. In the example from RFC4287 (title not content, but shouldn't make a difference):

<title type="xhtml" xmlns:xhtml="http://www.w3.org/1999/xhtml";>
  <xhtml:div>
    Less: <xhtml:em> &lt; </xhtml:em>
  </xhtml:div>
</title>

The XML parser will tell you that the content of xhtml:em is " < ". Obviously, if you're going to spit that out again to html or anything xml-based, you're going to have to re-encode it.

I'm a little concerned that this is bothering people who I know are smart, experienced implementors. Maybe you've stumbled on a corner case?

 -T




It would be a great aid if there was some "reference" feed that exercised many of the common mistakes parsers make, and at the same time expressed definitive advice for how these entities should be handled.

Anybody know of such a resource?

Daniel