[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Announcement of a new I-D
Rick Jelliffe posted this reply to xml-dev. Since this I-D
should be discussed here, I am forwarding his message.
> Murata Makoto wrote:
>
> > We are looking forward to your feedbacks.
>
>
> Very disappointing to see:
>
>
> 1) Charset "should" be given for application/xml. HTTP has a character
> set handling concept that comes from fantasyland. I would recommend a
> very different policy: never use xml/data, always use application/xml;
> never use charset, always use xml encoding declarations.
>
>
> 2) "When non-validating processors handle XML documents, they do not
> always read external parsed entities. Thus, interoperability is
> not guaranteed."
> This is just FUD: why isnt this handled by the "standalone" declaration.
> If it is a comment about bugs in software, that is out-of-place here.
>
>
>
> 3) Support for xml:base. Xml:base is currently being railroaded through
> W3C with requirements document--on behalf of my organization which is a
> W3C member I have repeatedly asked what its justification is, and there
> has never been any answer. Xml:base is dangerous because it creates an
> unlabelled dialect of XML--a general XML editor cannot treat URIs as
> text when cutting and pasting, it also may have to do something with
> xml:base. It would be OK if managed as part of some more general
> package, but not by itself. In any case, it is not clear whether
> xml:base applies to all data marked by a schema as a URI or just to data
> marked as an xlink:href. It does not apply to URIs in SYSTEM identifiers
> in entities, ASAIK.
>
>
> Without knowing which URIs are "known" by xml:base, and what its
> interaction is with xml schemas, broad statements that embedded URIs
> should be interpreted relative to xml:base are surely incorrect, or at
> least too early. There is no need for F with xml:base, but U and D are
> certainly warranted.
>
>
> 4) The rocket scientists at IETF have managed a new thing with the spec
> for utf16be (if you use utf16be you cannot have a BOM apparantly): it
> means that not only can you do too little as far as labelling your data,
> you can now do *too much*! If you want to use big-endian utf16 and your
> software sticks in a BOM just to be safe you are ruined. I thought I had
> seen everything. This makes the well-intentioned user pay: it looks like
> an enabling provision, but its effect is surely to prevent the use of
> big-endian UTF16. Users should not be penalized for providing "too
> much" labelling.
>
>
> 5) Along similar lines, but far worse and of major importance for
> internationalization, the fragment identifier of a URI has to be in
> US-ASCII with %HH escaping. Here I am in Taipei and I want to include
> an Xpointer to refer to an ID or element name or attribute name or
> value, and I have to first find the numeric values of my Big5, then
> trancode it into Unicode, then find out what the Unicode values are in
> HEX, then put them in. Is that the way it is supposed to work? This is
> exactly the sort of thing that should be provided by the XML
> infrastructure, not by the poor user: to tell the user "you can say
> 'yes' in your native language in that attribute value, but you cannot
> type it directly when you want to reference that elemement" is not
> acceptable. And what about XSLT: does it mean that when I use an XPath
> which includes a document() reference, that I have to suddenly stop when
> I get to the fragment identifier of the URI and switch to %HH?
>
>
> This draft has lost the plot. XML is first and foremost a markup
> language: that is its name, that is its purpose, that is what we want.
> Someone should be able to open their local text editor and create a
> legitimate document using all the characters available in that editor,
> without every having to perform any character-to-number conversions or
> looking up any character tables. This is a basic operational simplicity
> which gives XML 99% of its value.
>
>
> If HTTP requires data in a different format, the XML infrastructure
> should provide that transparently. If IETF or any RFC pupports to make
> any requirement on how I can mark up legitimate characters, then the
> comment we should respond is terse an monosyllabic: it is not the
> business of an RFC to mandate any particular encodings within an XML
> document. It is ultra vires. (Note, I am *not* saying that an RFC cannot
> proscribe certain characters. I am saying that an RFC oversteps itself
> if it tries to tell me that I must use %HH rather than &#HHHH; or a
> direct character inside my XML document.)
>
>
> If a future technology like xml:base is included in discussion, why is
> there no discussion of international domain names? XML should not
> constrain the domain names to be any character (the RFCs currently keep
> the door open): whatever mechanism is eventually used to allow
> internationalised domain names, that should be handled transparently by
> the XML processor and the user should be able to see and type the direct
> characters (or have NCRS).
> So mandating %HH has the problem that we will have to revisit this RFC
> as soon as international DNS comes online (which will probably be sooner
> rather than later: there is a staggering demand here in Asia for it).
> It would be best if XML kept out of the issue entirely: in particular,
> if it is decided that CNRP should be used to convert IDNS names into
> ASCII domain names, then that is definitely something that would make
> the approach of %HH in the domain name part unneeded: why should we have
> one rule in the domain name and another rule for other places.
>
>
>
> Good to see:
>
>
> 1) |xml suffix is great idea
>
>
> 2) MIME types for DTDs and external parsed entities
>
>
>
> I regret to say, I think these flaws are so great that the draft should
> be withdrawn and retought at once. Especially point 5 is a disaster. In
> particular, whenever there is some conversion between IETF syntax
> requirements and simple plain text editing, this should be hidden from
> the user and taken care of by the XML processor.
>
>
> The current draft is a step backwards for internationalization of the
> WWW in practise. Or, at least, it makes life simpler for ASCII users
> but much more difficult for us non-ASCII users. And I think it makes
> life much more complicated for implementers: it means that user
> interfaces will have to have data conversion routines built in, rather
> than just leaving it to the URI referencing library routines.
>
>
>
> Rick Jelliffe