[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Additional syntactic restrictions




Hello Chris, At 19:04 02/06/13 +0200, Chris Lilley wrote: >On Saturday, June 8, 2002, 11:43:33 AM, Martin wrote: > >MD> Hello Chris, > >MD> As far as I understand, you are saying that: > >MD> Restricting a given XML protocol to UTF-8 only is a bad idea, >MD> because (maybe among else) somebody then might construct/tweak >MD> a parser so that it only accepts UTF-8 (which would no longer >MD> be able to be called an XML parser), and then somebody else >MD> mistakenly uses that parser to try and parse generic XML, >MD> and this will lead to problems. > >MD> Now let's take a very similar situation: Somebody defines >MD> an XML protocol with only one element, foo, and only one >MD> attribute, bar. > >That is a different situation. All uses of XML restrict the element >names, attribute names and attribute values that they take. Most XML applications do restrict elements and attributes. But some are extremely liberal (e.g. RDF). >XML spec >allows this. Of course. The XML spec ALLOWS *parsers* to accept lots of other encodings than just UTF-8 and UTF-16. The XML spec REQUIRES XML *parsers* (well-formed) to accept any and all legal element and attribute names. >On the other hand, specs that say you can only use single quotes >around attribute values, or only one encoding, or attribute names that >are a max of six characters long, other things like that are bad in general. For the quotes, obviously a parser that doesn't accept both varieties is a fake parser, and a very bad idea. But restriction to a single kind of quotes sometimes makes sense, as Canonical XML shows. For attribute name length, it's quite important to distinguish parsers and applications. A parser that only can deal with attributes of max 6 chars is obviously crap. But an application that by design or by pure chance happens to have only attribute names with 6 or less chars should be perfectly fine. Or would you claim otherwise? Should each application/DTD have at least one element or attribute name with more than 6 characters? This extremely clearly shows the importance of being able to distinguish between parsers and applications. >MD> Somebody else then constructs/tweaks a >MD> (non-validating) parser so that it only accepts elements >MD> foo and attributes bar. Later that parser is misused >MD> for some other piece of XML, and this leads to problems. > >Yes, though a trivuial test would detect that and a fair,ly thorough >test would be needed for the single-quotes-only or >attribute-names-must-be ascii parsers. Obviously a single, trivial test only is needed for detecting a fake parser that accepts only single quotes. Here it is: <root foo='value' bar="value" /> Same for 'attribute-names-must-be-ascii': (sorry to all those who's mailer can't take it) <?xml version='1.0' encoding='iso-2022-jp'?> <root 属性="value"/> Same for a fake parser that doesn't accept UTF-16 (left as an exercise to the reader). So your argument that I read as 'amount of work needed for testing' doesn't lead to the conclusions you want. >MD> If we follow your logic, we would have to disallow >MD> all XML protocols that use a finite number of element/ >MD> attribute types. > >No, we would not, only if we followed the logic you thought I was proposing. I interpreted your earlier mails as saying: Don't put restrictions on the protocol that might seduce implementers to create special-purpose parsers that others might mistake as general-purpose parsers, later leading to breakdowns. Above, I find a new variant of your statement: Don't put restrictions on the protocol that might seduce implementers to create special-purpose parsers that others might mistake as general-purpose parsers, later leading to breakdowns, if these restrictions are difficult to test. I showed that this statement doesn't help in the cases you have brought up. I guess that I'm still not understanding your logic. Can you try again? Thanks! Regards, Martin.