From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Tue Mar 11 2003 - 05:16:36 CST
In <3E6CC144.60904@Sonietta.blilly.com> Bruce Lilly <blilly@erols.com> writes:
>Martin Duerst wrote:
>> So neither RFC 2234 nor RFC 3066 say that the language tags have
>> to be encoded in ASCII.
>3066 refers to IANA registrations, which are in the file language-tags,
>which is in ASCII, and the ISO 3166 and 639-2 codes are also in a
>subset of ASCII.
> > Indeed, it would be a bad idea if they did.
>> As an example, in an HTML document with charset=UTF-16, language
>> codes obviously are encoded in UTF-16. Similar for EBCDIC-encoded
>> documents.
>We're discussing text message header fields, not HTML. ...
If it is legitimate for the standards which define HTML to permit the RFC
3066 language tags to be expressed in UTF-16, then it must be legitimate
for the Netnews standard to express those same tags in the Unicode
plane 14 characters expressly designed for that very purpose.
>That mapping and subsequent encoding is the problem, because
>it makes the language tag inaccessible.
And in fact undoing that mapping consists simply of masking the characters
with 0x7F.
sx
-- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5