[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: -1 on the features draft



Tim Bray wrote:
> 4. The mechanism for specifying features needs to be 
> decoupled from the initial list of features.  A substantial 
> number of the features here are wide-open 
> roll-out-the-red-carpet invitations to violations of Postel's 
> law.  I believe that APP server implementations bloody well 
> SHOULD accept content type="text | html | xhtml", otherwise 
> what was the point of building all that machinery into 4287.  

Here is an attempt to reduce the feature set by replace the following
features:
     XHTML content, XHTML rights, XHTML summary, XHTML title
     HTML content, HTML rights, HTML summary, HTML title
     Text content, Text rights, Text summary, HTML summary
with one feature:
     HTML is preferred/supported/unsupported in text constructs.

Any server that accept XHTML or HTML should accept plain text, because
it is trivial to convert from plain text to XHTML or HTML:
    <xsl:template match='atom:content[@type="text"]'>
       <atom:content type='xhtml'>
          <html:div>
            <xsl:value-of select='normalize-space(.)'/>
          </html:div>
       </atom:content>
   </xsl:template>
In particular, RFC4287 says that we can't depend on blank lines between
paragraphs being preserved, and it is not clear on whether
<SPACE><NEWLINE> can be collapsed to just <SPACE>, which means that
plain text is not appropriate for anything except a single paragraph.

A server can trivially strip out any markup it wants from XHTML text
constructs. Servers should be expected to strip out @id, @class, @style,
<style>, <script>, <font> and a host of other problematic markup. At the
extreme, any server can accept XHTML anywhere it accepts plain text,
simply by stripping out all the markup. This might result in significant
information loss (e.g. stripping the markup from a table would result in
a mess). However, clients can mitigate this by avoiding complex
constructs like tables and lists in places they are likely to be
stripped (titles and summaries, in particular).

As a result, I think there is no need to have features that distinguish
between plain text and XHTML text; every implementation should support
both.

However, I think it is unreasonable to require implementations to parse
HTML. I think content producers (clients and servers) should observe
Postel's law and send XHTML; the receiver can always easily convert it
to HTML if needed. If the server accepts HTML at all, it should support
it in all text constructs (likely stripping out markup different types
of markup for titles, summaries, rights, and content).

- Brian