From: Per Abrahamsen (abraham@dina.kvl.dk)
Date: Fri Sep 21 2001 - 07:28:10 CDT
Jean-Marc Desperrier <jean-marc.desperrier@certplus.com> writes:
> This message is in UTF-8 and if your mail client does not decode UTF-8
> will be hard to understand, because most of the content discuss how the
> same string will be shown in UTF-8 and in ISO-8859-1.
This client has some limited UTF-8 support I don't fully trust.
> What is there in the control message ?
I believe there is some semi-official ftp server that store all
newgroup control messages.
> I saw a PGP signed message in dk.admin that says the name is
> dk.test.utf8-æøå.
> This message is encoded in ISO-8859-1.
That was either the approval message from the steering committee, or
the from the newsbastard with the authority to send out the newgroup
message.
> But it doesn't say that in order to create this group you must create it
> with the UTF-8 encoding of dk.test.utf8-æøå, ie if you use a tool to
> create that group that assumes the character set is ISO-8859-1, you would
> need to trick it by creating the group dk.test.utf8-æøå in order
> to create dk.test.utf8-æøå, or else you will in fact create a group
> dk.test.utf8-渥, instead of dk.test.utf8-æøå.
The SC messsage mentioned the byte sequence. The message from the
newsbastard did not, and didn't even mention its own character set.
It was generated by a script that also generated the control message.
> I'm afraid most newsadmin will create dk.test.utf8-渥 instead of
> dk.test.utf8-æøå if they just cut/paste the content of this creation
> message.
Those who read dk.admin will probably have seen the SC message.
> I don't know what will happen with automatic creation, but if the control
> message is in ISO-8859-1, I expect the servers which will not reject it
> for invalid character to create dk.test.utf8-渥 instead of
> dk.test.utf8-æøå.
I doubt the control message specified any character set. It just used
the utf-8 byte sequence.
At www.supernews.com you can see what groups it carries, and when
searching for groups containg the string dk.test.* it come out with
the expected 6 bytes (the UTF-8 encoding displayed as Latin-1).
> Why did you choose a group name which representation in ISO-8859-1 is also
> a valid utf-8 string ?
I did not know 渥 was a valid utf-8 string. In this client,
they are displayed as "unknown byte codes", but that means little.