[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xml:lang attribute




On Thu, 7 Aug 2003 14:28:27 -0400, Maciej Ceglowski <mceglows@xxxxxxxxxxxxxx> wrote:


Right now, there are 'title' elements at the feed and entry level, for which the language attribute is undefined. If the feed itself had an xml:lang attribute, it could cascade down to child elements.

As far as I have understood, the xml:lang attribute on the feed element does indeed cascade down.


I believe the consensus on the Wiki was to allow for xml:lang values of 'unknown' to grandfather in tools that don't export language metadata.

This is what the XML specification has to say about xml:lang [1]:


A special attribute named xml:lang may be inserted in documents to
specify the language used in the contents and attribute values of
any element in an XML document. In valid documents, this attribute,
like any other, must be declared if it is used. The values of the
attribute are language identifiers as defined by [IETF RFC 1766], Tags for the Identification of Languages, or its successor on the
IETF Standards Track.


RFC 1766 [2] dictates either:

1. Two-letter country-codes, as defined in ISO-639, optionally followed by a dash and dialect/variant information. Examples:
no-nynorsk, en-cockney


2. For languages that aren't defined in ISO-639, but have a IANA-assigned language code, one can use the prefix i-. Example:
i-sami-no


3. For languages that does not fit 1) or 2), one can use the private prefix x-. Examples:
x-klingon, x-quenya, x-minbari


After explaining these three valid uses: RFC 1766 explicitly says:

Other values cannot be assigned except by updating this standard.

Before someone suggests using "x-unknown" as an attribute for undefined languages: That is overloading the meaning of xml:lang, suggesting that we are using a private language, whose name is "unknown". If the language is unknown, this should be addressed by omitting the xml:lang attribute from that particular feed.

If we then read the current informal specification [3], it says:

optional attributes of feed:
- xml:lang. SHOULD be included. MAY be overwritten on individual entries, if the feed contains entries in more than one language.


Which is, IMHO, exactly as it should be. RFC 2119 [4] defines the use of the word "SHOULD" as:
3. SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.



References: ----------- [1] <URL:http://www.w3.org/TR/REC-xml#sec-lang-tag> [2] <URL:http://www.ietf.org/rfc/rfc1766.txt> [3] <URL:http://diveintomark.org/public/2003/08/atom02spec.txt> [4] <URL:http://www.ietf.org/rfc/rfc2119.txt>

--
Arve Bersvendsen

http://www.virtuelvis.com
http://www.bersvendsen.com