[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Announcement of a new I-D



Rick Jelliffe posted this reply to xml-dev.  Since this I-D 
should be discussed here, I am forwarding his message.


> Murata Makoto wrote: 
>   
> > We are looking forward to your feedbacks. 
> 
> 
> Very disappointing to see: 
> 
> 
> 1) Charset "should" be given for application/xml. HTTP has a character 
> set handling concept that comes from fantasyland. I would recommend a 
> very different policy: never use xml/data, always use application/xml; 
> never use charset, always use xml encoding declarations. 
> 
> 
> 2) "When non-validating processors handle XML documents, they do not 
>       always read external parsed entities. Thus, interoperability is 
>       not guaranteed." 
> This is just FUD: why isnt this handled by the "standalone" declaration. 
> If it is a comment about bugs in software, that is out-of-place here. 
> 
> 
> 
> 3) Support for xml:base. Xml:base is currently being railroaded through 
> W3C with requirements document--on behalf of my organization which is a 
> W3C member I have repeatedly asked what its justification is, and there 
> has never been any answer. Xml:base is dangerous because it creates an 
> unlabelled dialect of XML--a general XML editor cannot treat URIs as 
> text when cutting and pasting, it also may have to do something with 
> xml:base. It would be OK if managed as part of some more general 
> package, but not by itself. In any case, it is not clear whether 
> xml:base applies to all data marked by a schema as a URI or just to data 
> marked as an xlink:href. It does not apply to URIs in SYSTEM identifiers 
> in entities, ASAIK. 
> 
> 
> Without knowing which URIs are "known" by xml:base, and what its 
> interaction is with xml schemas, broad statements that embedded URIs 
> should be interpreted relative to xml:base are surely incorrect, or at 
> least too early. There is no need for F with xml:base, but U and D are 
> certainly warranted. 
> 
> 
> 4) The rocket scientists at IETF have managed a new thing with the spec 
> for utf16be (if you use utf16be you cannot have a BOM apparantly): it 
> means that not only can you do too little as far as labelling your data, 
> you can now do *too much*! If you want to use big-endian utf16 and your 
> software sticks in a BOM just to be safe you are ruined. I thought I had 
> seen everything. This makes the well-intentioned user pay: it looks like 
> an enabling provision, but its effect is surely to prevent the use of 
> big-endian UTF16. Users should not be penalized for providing "too 
> much" labelling. 
> 
> 
> 5) Along similar lines, but far worse and of major importance for 
> internationalization, the fragment identifier of a URI has to be in 
> US-ASCII with %HH escaping. Here I am in Taipei and I want to include 
> an Xpointer to refer to an ID or element name or attribute name or 
> value, and I have to first find the numeric values of my Big5, then 
> trancode it into Unicode, then find out what the Unicode values are in 
> HEX, then put them in. Is that the way it is supposed to work? This is 
> exactly the sort of thing that should be provided by the XML 
> infrastructure, not by the poor user: to tell the user "you can say 
> 'yes' in your native language in that attribute value, but you cannot 
> type it directly when you want to reference that elemement" is not 
> acceptable. And what about XSLT: does it mean that when I use an XPath 
> which includes a document() reference, that I have to suddenly stop when 
> I get to the fragment identifier of the URI and switch to %HH? 
> 
> 
> This draft has lost the plot. XML is first and foremost a markup 
> language: that is its name, that is its purpose, that is what we want. 
> Someone should be able to open their local text editor and create a 
> legitimate document using all the characters available in that editor, 
> without every having to perform any character-to-number conversions or 
> looking up any character tables. This is a basic operational simplicity 
> which gives XML 99% of its value. 
> 
> 
> If HTTP requires data in a different format, the XML infrastructure 
> should provide that transparently. If IETF or any RFC pupports to make 
> any requirement on how I can mark up legitimate characters, then the 
> comment we should respond is terse an monosyllabic: it is not the 
> business of an RFC to mandate any particular encodings within an XML 
> document. It is ultra vires. (Note, I am *not* saying that an RFC cannot 
> proscribe certain characters. I am saying that an RFC oversteps itself 
> if it tries to tell me that I must use %HH rather than &#HHHH; or a 
> direct character inside my XML document.) 
> 
> 
> If a future technology like xml:base is included in discussion, why is 
> there no discussion of international domain names? XML should not 
> constrain the domain names to be any character (the RFCs currently keep 
> the door open): whatever mechanism is eventually used to allow 
> internationalised domain names, that should be handled transparently by 
> the XML processor and the user should be able to see and type the direct 
> characters (or have NCRS). 
> So mandating %HH has the problem that we will have to revisit this RFC 
> as soon as international DNS comes online (which will probably be sooner 
> rather than later: there is a staggering demand here in Asia for it). 
> It would be best if XML kept out of the issue entirely: in particular, 
> if it is decided that CNRP should be used to convert IDNS names into 
> ASCII domain names, then that is definitely something that would make 
> the approach of %HH in the domain name part unneeded: why should we have 
> one rule in the domain name and another rule for other places. 
> 
> 
> 
> Good to see: 
> 
> 
> 1) |xml suffix is great idea 
> 
> 
> 2) MIME types for DTDs and external parsed entities 
> 
> 
> 
> I regret to say, I think these flaws are so great that the draft should 
> be withdrawn and retought at once. Especially point 5 is a disaster. In 
> particular, whenever there is some conversion between IETF syntax 
> requirements and simple plain text editing, this should be hidden from 
> the user and taken care of by the XML processor. 
> 
> 
> The current draft is a step backwards for internationalization of the 
> WWW in practise. Or, at least, it makes life simpler for ASCII users 
> but much more difficult for us non-ASCII users. And I think it makes 
> life much more complicated for implementers: it means that user 
> interfaces will have to have data conversion routines built in, rather 
> than just leaving it to the URI referencing library routines. 
> 
> 
> 
> Rick Jelliffe