[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Resources for AtomPub parser validation
Thanks everybody for trying to explain this to me. Dave - that was a
cool idea to use a schema validation in your XML. But it seems like
that applies more to generating the XML than to parsing it. Or do I
misunderstand something?
Tim, thanks for your comments and for linking the RFC. It's starting
to sink in, but I'm still cloudy-minded.
On Jun 30, 2008, at 1:16 PM, Tim Bray wrote:
I'm a little concerned that this is bothering people who I know are
smart, experienced implementors. Maybe you've stumbled on a corner
case?
I think you might be overestimating both my smarts and experience ;)
Before taking over with MarsEdit I had little real world experience
with XML, so I find myself tackling many of "the finer points" without
the aid of any historical perspective. I've only recently started
getting more fine-toothed because I'm simultaneously changing parsers
and getting more serious about the generic AtomPub support in MarsEdit.
I find "HTML in an XHTML in an XML" to be inherently confusing. But
it's probably made worse by my lack of conviction about what format
the content "is in" when it's being edited by my users. While MarsEdit
is "mostly an HTML editor," it is also sort of an agnostic text
editor. So ... to boil down the points of confusion, here's a specific
example being cited by a customer whose AtomPub implementation I'm
testing MarsEdit against. This is what comes over the wire:
<content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"
xml:space="preserve">
Some test HTML with an excaped ampersand in it:
<img src="foo?bar=1&baz=2"/>
</div></content>
What's happening is the editor is ending up showing that & as just
a "&". This sounds like it's right, based on what you're saying that
the xhtml content gets unescaped by the XML parser, right?
But the customer's contention is that the & should remain escaped
in the HTML source, because that's how he typed it, and that's how it
exists in the database on his server.
I guess it boils down to whether I should be presenting data "in
HTML" (re-escaped?) or "literally." Perhaps this is not a question
that the Atom specification needs to or can answer :)
Daniel