[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: BIDI (was Proposal: Atomext WG)

Brian Smith wrote:
Atom documents are almost never hand-entered, and there is already a specification in place for markup up BIDI and even
ruby text in general XML. The odds that clients and servers
are going to correctly implement this extension--except
those targeted direclty towards BIDI users--seem pretty
low to me. Personally, it seems much easier to implement the an existing BIDI markup mechanism (Unicode, XML, and/or XHTML) than a new standard.
What are you basing that on?

The Unicode/W3C guidelines (http://www.w3.org/TR/unicode-xml/#Bidi and http://www.w3.org/International/questions/qa-bidi-controls) say this:

* Use *XHTML* BIDI markup whenever possible.
* Otherwise *CSS* whenever possible.
* Otherwise, consider building BIDI markup into your markup schema.
* We have to support BIDI formatting codes anyway, since the above mechanisms don't solve all BIDI problems.

Sorry if I wasn't clear, I was referring specifically to your assertion that, "it seems much easier to implement the an existing BIDI markup mechanism"... Having already implemented it, I can see no additional complexity or difficulty.

There's really not much guess work involved in the implementation and apps that choose not to implement support will be no worse or better off than they are currently.

That is not true. Consider a feed aggregator. If it doesn't support Atom BIDI, then it will not correctly rewrite entries to handle an inherited "dir" attribute from the atom:feed element. Since the directionality is also inherited by text constructs, whenever the implementation passes the text construct to a rendering engine, it needs to rewrite that content to handle the inherited directionality.

If an aggregator doesn't support the dir attribute, it will ignore it or drop it as if it wasn't there in the first place and will continue to operate as it always had, which is exactly what we want. So again, those apps will be no worse or better off than they are currently.

Don't forget atom:link/@title and atom:category/@label. atom:category/@term can also cause problems when implementation use that value for display purposes.

I know that. But, the Atom BIDI draft does not eliminate all uses of BIDI formatting characters in these attributes, either. And, it doesn't specify BIDI support for language-sensitive content in Atom service documents, category documents, RSS feeds, or RSD documents.

I have no interest in eliminating all uses of bidi formatting characters.

And yes, the spec does address bidi support for language-sensitive content in Atompub service and category documents. The spec language can be improved in this regard, but the spec alters the definition of the atomCommonAttributes production to add the dir attribute. In both the Atompub service document and the Atom categories document, the langauge-sensitive text is provided by elements from the Atom namespace (e.g. atom:category and atom:title). Also, the bidi spec predates the publication of rfc5023; I intend to have the next rev of the spec specifically discuss atompub service docs.

As for RSS and RSD doc's, I couldn't care less about solving the i18n issues of either.

Arbitrary extension elements can also have problems.

The BIDI draft says it only applies to constructs that RFC4287 labeled "language sensitive." Accordingly, the BIDI draft does not apply to extension elements.

Section 6.4.2: "Structured Extension elements are Language-Sensitive."

The bidi draft doesn't attempt to solve all the i18n issues with Atom. Ruby text is a problem for pretty much everything, especially given the fact that most browsers don't have a clue how to properly render ruby text yet. The bidi draft rightfully focuses on one small part of the problem.

I agree that a narrow scope is good. But, a solution for Ruby text will also be applicable to BIDI, especially if that solution involves the reuse of XHTML markup and/or CSS.

if/when the need emerges to improve ruby text support in Atom, I will glady help work out a solution.

It also doesn't solve the problem with atom:link/@title or
other attributes that
are language-sensitive.
Yes, it does.

The Atom BIDI draft does not provide a way of specifying base different base directionalities for attributes on the same element, it doesn't eliminate all need for BIDI formatting characters in language-sensitive attribute values, it doesn't provide a mechanism for discovering which (nested) extension elements and attributes are affected by the proposed Atom BIDI markup.

atom:category and atom:link each have exactly one language-sensitive attribute.

there's no reason to try eliminating all need for bidi formatting characters in language sensitive attribute values.

The bidi attribute applies to language-sensitive elements and attributes. Defining which extension elements are language sensitive is up to the extension definition. There's no reason to provide a mechanism for discovering which extensions are affected.

If I implement RFC4287, the Unicode BIDI algorithm, XHTML BIDI, HTML BIDI, and the "Unicode in XML" guidelines, I will have pretty good BIDI support. It will require me to adhere to four different BIDI standards in addition to RFC4287. That is a lot of work already. Now, your Atom BIDI and URI template BIDI proposals add two more specifications that I would have to support--for a total of SIX standards to adhere to and resolve conflicts between, JUST to support BIDI text. And, even if I create well-formed documents adhering to all six standards, whenever I open them up in any of my text editors, or any feed reader, they will look wrong since nobody else is implementing all of those standards. I think that is totally unreasonable. Abdera might have amazing support for BIDI, for which you should be commended, but unless all Atom software is going to be implemented on top of Abdera, Abdera will not be able to reliably interoperate with anything. If we want to provide interopera!
 ble support for BIDI, we need to make it as simple to implement as possible.

My counter-proposal is simple:

None of this is simple. Please do not claim that it is.

* Use XHTML/HTML BIDI/Ruby markup whenever possible.

Keep in mind the fact that the atom bidi spec very clearly indicates that the (x)html bidi mechanisms should be used in addition to the atom dir attribute.

* Otherwise, use Unicode BIDI/Ruby formatting codes, such that matching pairs of formatting codes are fully contained within a single text or attribute node.

Whose responsibility is it to apply the formatting codes? The person typing the text or the software? How does the software know when to apply the codes? Also, what about when an Atompub client edits an entry? Is the Atompub client responsible for preserving the unicode formatting characters? What if they don't? There are existing Atompub clients out there that, more than likely, will not, and since the formatting codes are non-visual, it's not likely the user will notice them either, causing unexpected rendering issues later on. How are non-bidi enabled clients supposed to know what to do? With the bidi attribute approach, per rfc5023, non-supporting clients are expected to at least preserve the bidi attribute but will otherwise continue working as they currently do, without risk of corrupting the text by inadvertently dropping or improperly nesting the bidi controls.

Also, imagine a case where we have a feed with 100 entries, each with about 5 atom:category elements. Let's stay that the feed is generally all RTL. Using your approach, that's at least 1000 extra characters in the feed, and 500 opportunities for the embedding to be screwed up.

Re: The ruby formatting codes: Even the Unicode spec warns against using the ruby formatting codes for anything other than internal storage. We gain absolutely nothing by bringing ruby into this discussion.

* Editors of new documents must be meticulous about inserting the proper markup and formatting codes.
* Processors of existing documents must be meticulous about preserving BIDI/Ruby markup and/or formatting codes whenever any part of the contained text is preserved.

Again, what about older editors that know nothing about the proper markup or formatting codes?

I recognize that this goes against the Unicode in XML guidelines. However, Atom already goes against the guidelines by having language-sensitive text in attribute values and other contexts where XHTML markup cannot be used.

What is the benefit of going against the Unicode in XML guidelines?

- James

- Brian