Re: When will News Article Format be approved?

New Message Reply About this list Date view Thread view Subject view Author view

From: Bruce Lilly (blilly@erols.com)
Date: Mon Mar 10 2003 - 10:45:56 CST


Martin Duerst wrote:

> So neither RFC 2234 nor RFC 3066 say that the language tags have
> to be encoded in ASCII.

3066 refers to IANA registrations, which are in the file language-tags,
which is in ASCII, and the ISO 3166 and 639-2 codes are also in a
subset of ASCII.

> Indeed, it would be a bad idea if they did.
> As an example, in an HTML document with charset=UTF-16, language
> codes obviously are encoded in UTF-16. Similar for EBCDIC-encoded
> documents.

We're discussing text message header fields, not HTML. And
EBCDIC is not suitable for any MIME use, and is in any event
not used in Internet text messages.

> Up to here, we still are working with the repertoire of characters
> in ASCII. To get to the Unicode plane 14 tag codes, we just map the
> ASCII codes to some other codes, for our specific protocol.

That mapping and subsequent encoding is the problem, because
it makes the language tag inaccessible.

> This is
> done all too often. RFC 2047, with base64 and qp, is a typical example.

RFC 2047 does not encode the language-tag (or charset-tag, or
encoding-tag, or delimiters), only the text string under
consideration.

> Not that I like the Unicode plane 14 tag codes, but then I don't
> like RFC 2047 either.

The issue is that the plane 14 encoding obscures the language
tag, whereas RFC 2047 does not. One should rarely see the wire
format in either case, it should be handled by the implementation.
And that raises another point, viz. that RFC 2047 is widely
implemented for text messages, whereas the plane 14 tags are not.

> And common sense makes clear that it would be a start for desaster.
> Consider the newsgroup chat.rec. Is that about chatting, or is it
> about cats, in French?

On ne sait pas. All of the reasons given in 2277 for language
tagging apply to internationalized items; they cannot be properly
handled witout language information. Now, it's certainly possible
to take the stance that the name is merely a protocol element, in
which case there is need for neither language nor any other subset
if i18n considerations.

And in any event, there ought to be some discussion and work done
on handling the relevant information that describes newsgroups, viz.
the descriptive text that is sent with a newgroup control message
and which is currently in an unspecified and untagged charset and
unspecified and untagged language when it is transferred to storage
by a server and subsequently retrieved to a client.

> Do we want to somehow tag it on the side
> to make the difference? Do you think you'll get the users to
> understand what's going on?

It can't hurt, and can only help. I think it's best to deal with
it in the descriptive text and leave the protocol element alone.


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.