Re: Backwards compatible

New Message Reply About this list Date view Thread view Subject view Author view

From: Erland Sommarskog (sommar-usefor@algonet.se)
Date: Sun Aug 25 2002 - 05:00:37 CDT


=?ISO-8859-1?Q?Claus_F=E4rber?= (list-ietf-wg-apps-usefor@faerber.muc.de) writes:
>So far, it has been a valid assumption that the Newsgroup header does
>not contain anything that can be damaged by RFC 2047 encoding.

No, this is wrong, as *any* encoding, RFC2047 or whatever, breaks Newsgroups.

What you possibly have been able to assume from the specification is that
Newsgroups would never be subject to encoding anyway, so that you would
not need an IF statement to exclude Newsgroups from encoding.

But it was also clear from the specifications, that you must not change
Newsgroups when posting to News.

>If you deliberatly choose to break that assumption, you can't blame the soft-
>ware.

Poorly written software that ignores essential presumptions for News are
of course broken by all accounts.

>What counts is that software software is able to handle newsgroup names
>which are within the bounds of the (current) specification. The
>roboustness principle dictates to accept "wrong" newsgroup names but it
>does not provide any guidelines how to handle them. Just because one
>possibility turns out to be better than another one does not mean it is
>more correct.

Anyone with half a brain cell can tell that when posting a *Usenet*
message, it is never correct to encode Newsgroups, no matter whether
it contains the expected characters or not.

>> To wit, all newsreaders that write to a TTY. They only need TTY, for
>> instance a Telnet client, that is able to present UTF-8.
>
>That's not the whole truth. With UTF-8 you have a non-trivial relation
>between octets, characters and columns. Most "dumb" leagacy newsreaders
>make the assumption that one octet equals one column.

This is probably manageable for the user.

>Further, most newsreader support only one display charset at once. With
>the dumb newsreader you describe, you won't be able to read any messages
>which use the 8bit charsets currently in use if you set your TTY charset
>to UTF-8.

Yes, I can handle this. It was the same thing when we went from 7-bit to
8-bit in the Swedish hierachies. That included a frequent changing of
the CRT settings.

Note, by the way, that the same issue applies to RFC2047. It doesn't
help if the mail reader knows the character set, because it has no way
to affect the display. Again, this is something the user must handle
himself.

>The main problem, however, is that many newsreaders are not able to pass
>through UTF-8 newsgroup names. The worst thing that can (and will)
>happen is that the names are recoded or encoded in RFC 2047.

As we've noted, these newsreaders are seriously broken even with
regards to the current specification.

>>> This shows that you have never had a look at Punycode. Punycode encodes
>>> ASCII characters as-is, so for most Western languages the words *are*
>>> quite readable. For most non-Western languages, which traditionally
>>> don't use UTF-8, there's not much difference between UTF-8 and Punycode.
>
>> So what does se.test.räksmörgås become in Punycode?
>
>se.test.zq--rksmrgs-5wao1o

The encoding in UTF-8 for se.test.räksmörgås is se.test.räksmörgÃ¥s.

There are three distinct differences between UTF-8 and Punycode:

1. Punycode adds line noise in the beginning and the end of string.
2. With Punycode you cannot tell where the excluded letters should
   be inserted, and neither how many there are.
3. With UTF-8 each scrambled character is scrambled to a unique character
   sequence.

One can note that even RFC2047 fulfils the latter two points.

The Punycode encoding is completely unacceptable.

--
Erland Sommarskog, Stockholm, sommar@algonet.se


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.