[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Additional syntactic restrictions
Hello Chris,
At 19:04 02/06/13 +0200, Chris Lilley wrote:
>On Saturday, June 8, 2002, 11:43:33 AM, Martin wrote:
>
>MD> Hello Chris,
>
>MD> As far as I understand, you are saying that:
>
>MD> Restricting a given XML protocol to UTF-8 only is a bad idea,
>MD> because (maybe among else) somebody then might construct/tweak
>MD> a parser so that it only accepts UTF-8 (which would no longer
>MD> be able to be called an XML parser), and then somebody else
>MD> mistakenly uses that parser to try and parse generic XML,
>MD> and this will lead to problems.
>
>MD> Now let's take a very similar situation: Somebody defines
>MD> an XML protocol with only one element, foo, and only one
>MD> attribute, bar.
>
>That is a different situation. All uses of XML restrict the element
>names, attribute names and attribute values that they take.
Most XML applications do restrict elements and attributes.
But some are extremely liberal (e.g. RDF).
>XML spec
>allows this.
Of course. The XML spec ALLOWS *parsers* to accept lots of other
encodings than just UTF-8 and UTF-16. The XML spec REQUIRES
XML *parsers* (well-formed) to accept any and all legal element
and attribute names.
>On the other hand, specs that say you can only use single quotes
>around attribute values, or only one encoding, or attribute names that
>are a max of six characters long, other things like that are bad in general.
For the quotes, obviously a parser that doesn't accept
both varieties is a fake parser, and a very bad idea.
But restriction to a single kind of quotes sometimes
makes sense, as Canonical XML shows.
For attribute name length, it's
quite important to distinguish parsers and applications.
A parser that only can deal with attributes of max 6 chars
is obviously crap. But an application that by design or
by pure chance happens to have only attribute names with
6 or less chars should be perfectly fine. Or would you
claim otherwise? Should each application/DTD have at least
one element or attribute name with more than 6 characters?
This extremely clearly shows the importance of being able
to distinguish between parsers and applications.
>MD> Somebody else then constructs/tweaks a
>MD> (non-validating) parser so that it only accepts elements
>MD> foo and attributes bar. Later that parser is misused
>MD> for some other piece of XML, and this leads to problems.
>
>Yes, though a trivuial test would detect that and a fair,ly thorough
>test would be needed for the single-quotes-only or
>attribute-names-must-be ascii parsers.
Obviously a single, trivial test only is needed for detecting
a fake parser that accepts only single quotes. Here it is:
<root foo='value' bar="value" />
Same for 'attribute-names-must-be-ascii':
(sorry to all those who's mailer can't take it)
<?xml version='1.0' encoding='iso-2022-jp'?>
<root 属性="value"/>
Same for a fake parser that doesn't accept UTF-16
(left as an exercise to the reader).
So your argument that I read as 'amount of work needed for testing'
doesn't lead to the conclusions you want.
>MD> If we follow your logic, we would have to disallow
>MD> all XML protocols that use a finite number of element/
>MD> attribute types.
>
>No, we would not, only if we followed the logic you thought I was proposing.
I interpreted your earlier mails as saying:
Don't put restrictions on the protocol that might seduce
implementers to create special-purpose parsers that others
might mistake as general-purpose parsers, later leading to
breakdowns.
Above, I find a new variant of your statement:
Don't put restrictions on the protocol that might seduce
implementers to create special-purpose parsers that others
might mistake as general-purpose parsers, later leading to
breakdowns, if these restrictions are difficult to test.
I showed that this statement doesn't help in the cases you
have brought up.
I guess that I'm still not understanding your logic.
Can you try again? Thanks!
Regards, Martin.