[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: BIDI (was Proposal: Atomext WG)

James M Snell wrote:
> Brian Smith wrote:
> > [snip]
> > I think it makes more sense to get people to get some working 
> > implementations that have informally agreed on some extensions,
> I implemented extended bidi support in Apache Abdera months 
> ago; and our internal blogging environment supports bidi entries.

That is a definitely a positive first step. What other implementations does Abdera interoperate with with regard to BIDI text?

> > How much of a problem is BIDI in Atom today? (This isn't a 
> rhetorical question). 
> It's a problem. Quite a few readers I have tested do not even 
> properly support bidi markup used in entry title's that use 
> (x)html.

That is exactly my point. People are not even implementing the current standards for BIDI text. Adding another standard to implement is not going to make the situation better.

> Since the values for those attributes 
> are typically provided by humans, relying on the software to 
> properly insert the unicode formatting codes (which aren't 
> recommended for markup in the first place) is problematic at best.

If the software cannot correctly insert the formatting codes into the document, how would it be able to insert the correct directionality markup?

> > Atom documents are almost never hand-entered, and there is 
> > already a specification in place for markup up BIDI and even
> > ruby text in general XML. The odds that clients and servers
> > are going to correctly implement this extension--except
> > those targeted direclty towards BIDI users--seem pretty
> > low to me. Personally, it seems much easier to 
> > implement the an existing BIDI markup mechanism (Unicode, 
> > XML, and/or XHTML) than a new standard.
> What are you basing that on?

The Unicode/W3C guidelines (http://www.w3.org/TR/unicode-xml/#Bidi and http://www.w3.org/International/questions/qa-bidi-controls) say this:

* Use *XHTML* BIDI markup whenever possible.
* Otherwise *CSS* whenever possible.
* Otherwise, consider building BIDI markup into your markup schema.
* We have to support BIDI formatting codes anyway, since the above mechanisms don't solve all BIDI problems.

> There's really not much guess work 
> involved in the implementation and apps that choose not to implement 
> support will be no worse or better off than they are currently.

That is not true. Consider a feed aggregator. If it doesn't support Atom BIDI, then it will not correctly rewrite entries to handle an inherited "dir" attribute from the atom:feed element. Since the directionality is also inherited by text constructs, whenever the implementation passes the text construct to a rendering engine, it needs to rewrite that content to handle the inherited directionality.

> Don't forget atom:link/@title and atom:category/@label. 
> atom:category/@term can also cause problems when 
> implementation use that 
> value for display purposes.

I know that. But, the Atom BIDI draft does not eliminate all uses of BIDI formatting characters in these attributes, either. And, it doesn't specify BIDI support for language-sensitive content in Atom service documents, category documents, RSS feeds, or RSD documents.

> Arbitrary extension elements can also have problems.

The BIDI draft says it only applies to constructs that RFC4287 labeled "language sensitive." Accordingly, the BIDI draft does not apply to extension elements.

> The bidi draft doesn't attempt to solve all the i18n issues 
> with Atom. Ruby text is a problem for pretty much everything, 
> especially given the fact that most browsers don't have a
> clue how to properly render ruby text yet.  The bidi draft 
> rightfully focuses on one small part of the problem.

I agree that a narrow scope is good. But, a solution for Ruby text will also be applicable to BIDI, especially if that solution involves the reuse of XHTML markup and/or CSS. 

> > It also doesn't solve the problem with atom:link/@title or 
> other attributes that 
> > are language-sensitive.
> Yes, it does.

The Atom BIDI draft does not provide a way of specifying base different base directionalities for attributes on the same element, it doesn't eliminate all need for BIDI formatting characters in language-sensitive attribute values, it doesn't provide a mechanism for discovering which (nested) extension elements and attributes are affected by the proposed Atom BIDI markup. 

If I implement RFC4287, the Unicode BIDI algorithm, XHTML BIDI, HTML BIDI, and the "Unicode in XML" guidelines, I will have pretty good BIDI support. It will require me to adhere to four different BIDI standards in addition to RFC4287. That is a lot of work already. Now, your Atom BIDI and URI template BIDI proposals add two more specifications that I would have to support--for a total of SIX standards to adhere to and resolve conflicts between, JUST to support BIDI text. And, even if I create well-formed documents adhering to all six standards, whenever I open them up in any of my text editors, or any feed reader, they will look wrong since nobody else is implementing all of those standards. I think that is totally unreasonable. Abdera might have amazing support for BIDI, for which you should be commended, but unless all Atom software is going to be implemented on top of Abdera, Abdera will not be able to reliably interoperate with anything. If we want to provide interoperable support for BIDI, we need to make it as simple to implement as possible.

My counter-proposal is simple:

* Use XHTML/HTML BIDI/Ruby markup whenever possible.
* Otherwise, use Unicode BIDI/Ruby formatting codes, such that matching pairs of formatting codes are fully contained within a single text or attribute node.
* Editors of new documents must be meticulous about inserting the proper markup and formatting codes.
* Processors of existing documents must be meticulous about preserving BIDI/Ruby markup and/or formatting codes whenever any part of the contained text is preserved.

I recognize that this goes against the Unicode in XML guidelines. However, Atom already goes against the guidelines by having language-sensitive text in attribute values and other contexts where XHTML markup cannot be used.

- Brian