[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: -1 on the features draft



Scroll down for comments.

Brian Smith wrote:
> Tim Bray wrote:
>> 4. The mechanism for specifying features needs to be 
>> decoupled from the initial list of features.  A substantial 
>> number of the features here are wide-open 
>> roll-out-the-red-carpet invitations to violations of Postel's 
>> law.  I believe that APP server implementations bloody well 
>> SHOULD accept content type="text | html | xhtml", otherwise 
>> what was the point of building all that machinery into 4287.  
> 
> Here is an attempt to reduce the feature set by replace the following
> features:
>      XHTML content, XHTML rights, XHTML summary, XHTML title
>      HTML content, HTML rights, HTML summary, HTML title
>      Text content, Text rights, Text summary, HTML summary
> with one feature:
>      HTML is preferred/supported/unsupported in text constructs.
> 
> Any server that accept XHTML or HTML should accept plain text, because
> it is trivial to convert from plain text to XHTML or HTML:
>     <xsl:template match='atom:content[@type="text"]'>
>        <atom:content type='xhtml'>
>           <html:div>
>             <xsl:value-of select='normalize-space(.)'/>
>           </html:div>
>        </atom:content>
>    </xsl:template>
> In particular, RFC4287 says that we can't depend on blank lines between
> paragraphs being preserved, and it is not clear on whether
> <SPACE><NEWLINE> can be collapsed to just <SPACE>, which means that
> plain text is not appropriate for anything except a single paragraph.
> 
> A server can trivially strip out any markup it wants from XHTML text
> constructs. Servers should be expected to strip out @id, @class, @style,
> <style>, <script>, <font> and a host of other problematic markup. At the
> extreme, any server can accept XHTML anywhere it accepts plain text,
> simply by stripping out all the markup. This might result in significant
> information loss (e.g. stripping the markup from a table would result in
> a mess). However, clients can mitigate this by avoiding complex
> constructs like tables and lists in places they are likely to be
> stripped (titles and summaries, in particular).
> 

These kinds of issues are precisely why each of these are called out
specifically.  Take a look, for instance, at the mechanism Microsoft
came up with (independently) for Windows Live Writer,

  http://msdn2.microsoft.com/en-us/library/bb463260.aspx

Look at the list of options they provide: supportsEmptyTitles,
requiresHtmlTitles, supportsScripts, supportsEmbeds,
maxCategoryNameLength (!!), supportsCustomDate, the list goes on.

These options are specifically designed to allow the client to enable or
disable specific features so that clients do not get the wrong idea --
e.g. that they can use a table in their post when they really can't.  As
a blog user, I'd much rather the tooling figure all this out for me
automatically and based on the feedback I've received from my own user
community within IBM, I'm not the only one.

> As a result, I think there is no need to have features that distinguish
> between plain text and XHTML text; every implementation should support
> both.
> 

I disagree and I know my users would as well, especially if it means
that table they spent so much time putting together with formatting and
image includes and a youtube embed, etc suddenly came out with all
markup removed.

I've got no problem with the notion that any impl that supports HTML
should also support XHTML (and vice versa) but it's silly to think that
text and xhtml support are somehow equivalent.

This particular set of features likely could be simplified, but not like
this.

> However, I think it is unreasonable to require implementations to parse
> HTML. I think content producers (clients and servers) should observe
> Postel's law and send XHTML; the receiver can always easily convert it
> to HTML if needed. If the server accepts HTML at all, it should support
> it in all text constructs (likely stripping out markup different types
> of markup for titles, summaries, rights, and content).
> 

You, Tim and Rob have all alluded to this idea that impls "should" be
following some minimum set of behaviors.  Unfortunately, neither of Atom
specs back any of that up.

- James