Re: dk.test.utf8-æøå passed

New Message Reply About this list Date view Thread view Subject view Author view

From: Per Abrahamsen (abraham@dina.kvl.dk)
Date: Fri Sep 21 2001 - 07:28:10 CDT


Jean-Marc Desperrier <jean-marc.desperrier@certplus.com> writes:

> This message is in UTF-8 and if your mail client does not decode UTF-8
> will be hard to understand, because most of the content discuss how the
> same string will be shown in UTF-8 and in ISO-8859-1.

This client has some limited UTF-8 support I don't fully trust.

> What is there in the control message ?

I believe there is some semi-official ftp server that store all
newgroup control messages.

> I saw a PGP signed message in dk.admin that says the name is
> dk.test.utf8-æøå.
> This message is encoded in ISO-8859-1.

That was either the approval message from the steering committee, or
the from the newsbastard with the authority to send out the newgroup
message.

> But it doesn't say that in order to create this group you must create it
> with the UTF-8 encoding of dk.test.utf8-æøå, ie if you use a tool to
> create that group that assumes the character set is ISO-8859-1, you would
> need to trick it by creating the group dk.test.utf8-æøå in order
> to create dk.test.utf8-æøå, or else you will in fact create a group
> dk.test.utf8-渥, instead of dk.test.utf8-æøå.

The SC messsage mentioned the byte sequence. The message from the
newsbastard did not, and didn't even mention its own character set.
It was generated by a script that also generated the control message.

> I'm afraid most newsadmin will create dk.test.utf8-渥 instead of
> dk.test.utf8-æøå if they just cut/paste the content of this creation
> message.

Those who read dk.admin will probably have seen the SC message.

> I don't know what will happen with automatic creation, but if the control
> message is in ISO-8859-1, I expect the servers which will not reject it
> for invalid character to create dk.test.utf8-渥 instead of
> dk.test.utf8-æøå.

I doubt the control message specified any character set. It just used
the utf-8 byte sequence.

At www.supernews.com you can see what groups it carries, and when
searching for groups containg the string dk.test.* it come out with
the expected 6 bytes (the UTF-8 encoding displayed as Latin-1).

> Why did you choose a group name which representation in ISO-8859-1 is also
> a valid utf-8 string ?

I did not know 渥 was a valid utf-8 string. In this client,
they are displayed as "unknown byte codes", but that means little.


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.