From: Martin Duerst (duerst@w3.org)
Date: Thu Mar 13 2003 - 15:52:51 CST
At 11:45 03/03/10 -0500, Bruce Lilly wrote:
>Martin Duerst wrote:
>>Up to here, we still are working with the repertoire of characters
>>in ASCII. To get to the Unicode plane 14 tag codes, we just map the
>>ASCII codes to some other codes, for our specific protocol.
>
>That mapping and subsequent encoding is the problem, because
>it makes the language tag inaccessible.
Inaccessible in what sense? Can you be more specific?
How important is it, for what kinds of processing, that
the language codes are in ASCII? And why?
> > This is
>>done all too often. RFC 2047, with base64 and qp, is a typical example.
>
>RFC 2047 does not encode the language-tag (or charset-tag, or
>encoding-tag, or delimiters), only the text string under
>consideration.
Well, 2047 doesn't encode a language tag, that's RFC 2231.
The charset-tag is indeed not encoded. The encoding tag is
encoded, 'base64' becomes 'B' and 'quoted-printable' becomes 'Q'.
The delimiters stand by themselves, the question of 'encoding'
is pretty much irrelevant for them.
Now I would argue that what really counts for most kinds of
processing is the actual text. Obscuring the actual text
while keeping language, charset,... visible seems to be
very much backwards.
>>Not that I like the Unicode plane 14 tag codes, but then I don't
>>like RFC 2047 either.
>
>The issue is that the plane 14 encoding obscures the language
>tag, whereas RFC 2047 does not. One should rarely see the wire
>format in either case, it should be handled by the implementation.
>And that raises another point, viz. that RFC 2047 is widely
>implemented for text messages, whereas the plane 14 tags are not.
The language tags that have been added to RFC 2047 syntax by
an additional RFC aren't really widely implemented. I'd be glad
to learn about implementations, I don't know a single one.
>>Do we want to somehow tag it on the side
>>to make the difference? Do you think you'll get the users to
>>understand what's going on?
>
>It can't hurt, and can only help.
Do you think that <en>rec.chat</en> and <fr>rec.chat</fr>
should be two different newsgroups, or that only one of
them should be allowed?
>I think it's best to deal with
>it in the descriptive text and leave the protocol element alone.
That sounds like a reasonable idea.
Regards, Martin.