[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Resources for AtomPub parser validation
On Jun 30, 2008, at 9:10 AM, Daniel Jalkut wrote:
In particular, I continue to have mind-bending problems
interpreting the ins and outs of how particular content needs to be
treated in the Atom format that is used by AtomPub. Right now, for
instance, I'm trying to learn definitively whether an escaped
"&" inside a content type xhtml div should be left alone or
converted to an ampersand in the parsed content output.
If it says <content type="xhtml"> then you just let the XML parser
keep running and take what it gives you. In the example from RFC4287
(title not content, but shouldn't make a difference):
<title type="xhtml" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<xhtml:div>
Less: <xhtml:em> < </xhtml:em>
</xhtml:div>
</title>
The XML parser will tell you that the content of xhtml:em is " < ".
Obviously, if you're going to spit that out again to html or anything
xml-based, you're going to have to re-encode it.
I'm a little concerned that this is bothering people who I know are
smart, experienced implementors. Maybe you've stumbled on a corner
case?
-T
It would be a great aid if there was some "reference" feed that
exercised many of the common mistakes parsers make, and at the same
time expressed definitive advice for how these entities should be
handled.
Anybody know of such a resource?
Daniel