[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Whitespace handling



I wonder whether it might be useful to add a section with some guidance on
whitespace handling.  In my experience, whitespace handling is something
that is easily overlooked, is much harder than it seems and is a source of
interoperability problems (e.g. MSXML's default policy of stripping
whitespace-only text nodes).

In particular, I think it should be pointed out that protocol designers need
to explicitly consider the issue and decide what to do about it.  If they
are using a schema language to specify the syntax, then that takes care of
it to the extent that it specifies where whitespace is allowed.  However, it
is still important to understand that the parser won't strip whitespace for
you, and in my view it's not a good idea to rely on the parser identifying
whitespace as insignificant.  A policy of stripping whitespace-only text
nodes (possibly overridable by xml:space="preserve") works well in
data-oriented cases, but can fail for vocabularies involving mixed content,
such as XHTML.

James