[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Sticks, carrots, real XML, ultra-liberal parsing



Isofarro wrote:
> Now today I reckon it would be impossible for any 
> single person to author a browser that can be used 
> as a replacement for Internet Explorer. Not because 
> IE is good, but because it has a huge number of 
> error-correcting features to deal with invalid and 
> tag-souped markup. 
	RSS, Atom, etc. will, it seems, fall into the same status
before too long. The issue is only partly related to whether or not
people are producing well-formed XML. As is often the case, those who
get into the game early have a tremendous advantage -- because they
can learn all the odd cases and edge cases incrementally instead of
having to track them down all at once.
	For instance, while people are constantly debating the
relatively simple issue of XML well-formedness, look at what
namespaces and the support of extensibility bring us... From an
implementers point of view, every namespace declaration potentially
means the equivalent of a new format to be understood. And, each
namespace will be abused by some and not by others. Everyone in the
game today is building up an experience base that will present a
massive barrier to entry for anyone wishing to start working with RSS
or Atom in the future.
	Take, for example, the Dublin Core tags that appear in RSS
feeds fairly frequently. There is a tremendous variety in how these
tags are being used. But, if you are in the game today, you'll work
out on Monday how site X is misusing those tags and on Tuesday you'll
work out how site Y is misusing them. Then, on Thursday, you'll
discover that site X has changed their interpretation of dc:source and
you can build in a rule that says: "for entries made before Thursday,
use interpretation X for later entries use interpretation Z."
Incrementally, you build up a massive knowledge base that can't be
replicated by any means other than theft, sale, or gift.
	For another example of the problem, look at the 14 different
types of link tags that are listed in the Wiki
(http://www.intertwingly.net/wiki/pie/LinkTagMeaning). Each of these
has massive opportunities to be misused and confused... Some will know
who misuses them and other won't. Those who know will have an
advantage over those who don't. (Note: This is one of my problems with
RDF... A wonderful idea in principle, but it rapidly leads to "link
soup" -- which is no better than "tag soup.")
	This complexity and the growing difficulty of implementing
these systems has much more fundamental roots than the simple question
of whether something is "well-formed" XML...

		bob wyman