[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Resources for AtomPub parser validation




Thanks everybody for trying to explain this to me. Dave - that was a cool idea to use a schema validation in your XML. But it seems like that applies more to generating the XML than to parsing it. Or do I misunderstand something?

Tim, thanks for your comments and for linking the RFC. It's starting to sink in, but I'm still cloudy-minded.

On Jun 30, 2008, at 1:16 PM, Tim Bray wrote:

I'm a little concerned that this is bothering people who I know are smart, experienced implementors. Maybe you've stumbled on a corner case?

I think you might be overestimating both my smarts and experience ;) Before taking over with MarsEdit I had little real world experience with XML, so I find myself tackling many of "the finer points" without the aid of any historical perspective. I've only recently started getting more fine-toothed because I'm simultaneously changing parsers and getting more serious about the generic AtomPub support in MarsEdit.

I find "HTML in an XHTML in an XML" to be inherently confusing. But it's probably made worse by my lack of conviction about what format the content "is in" when it's being edited by my users. While MarsEdit is "mostly an HTML editor," it is also sort of an agnostic text editor. So ... to boil down the points of confusion, here's a specific example being cited by a customer whose AtomPub implementation I'm testing MarsEdit against. This is what comes over the wire:

<content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"; xml:space="preserve">
Some test HTML with an excaped ampersand in it:
<img src="foo?bar=1&amp;baz=2"/>
</div></content>

What's happening is the editor is ending up showing that &amp; as just a "&". This sounds like it's right, based on what you're saying that the xhtml content gets unescaped by the XML parser, right?

But the customer's contention is that the &amp; should remain escaped in the HTML source, because that's how he typed it, and that's how it exists in the database on his server.

I guess it boils down to whether I should be presenting data "in HTML" (re-escaped?) or "literally." Perhaps this is not a question that the Atom specification needs to or can answer :)

Daniel