Re: 8-bit in newsgroups names

New Message Reply About this list Date view Thread view Subject view Author view

From: Leonid Yegoshin (egoshin@genesyslab.com)
Date: Thu Nov 20 1997 - 19:45:41 CST


    Hi,

>From: sommar@algonet.se (Erland Sommarskog)
>
>Leonid Yegoshin <egoshin@genesyslab.com> writes:
>> I don't agree with it. With this requerement news process software
>>MUST have UTF-8 understanding, and additional headache. SMTP/MIME allow _any_
>>coding in any headers. And news-processing software should convert
>>MIME-encoding Newsgroup: to UTF-8 ... Why ?
>
>Notice that a relayer does not need to know much at all about UTF-8. It
>should just passed the octets around. The Newsgroup header is by
>definition never encoded, which means that if the server sees a
>Newsgroups line with se.test.?iso-8859-1?Q?r=E5ksm=F6rg=E6s it should
>look for a group with this funny name in its active file, and nothing else.
>
   If "a relayer does not need to know much" and "It
should just passed the octets around", then we can expect sometime
to see in active file the newsgroup named

       relcom.=?koi8-r?Q?=C0=CD=CF=D2?= (prev relcom.humor)

  If you don't want to see it in active file, as I understand
there are choice from two options:

       1) replace MIME and all MIME-compatible mailers to use only UTF-8
               in Newsgroup header line, or
       2) hack relayer to support full UTF-8 as I already spoke.

Both choices are bad enough to ask question again - why we need THE only UTF-8?

>The clients will need probably need to know about UTF-8, but of course
>they should rely on routines provided by the OS.
>
   It is also bad decision, there are countries with already supported
codes and a lot of software for it. This is not include UTF-8 yet.

>> But I don't see any arguments for UTF-8-only encoding, yet...
>
>I guess the argument is that at some point in future all other character
>sets are hopefully eradicated.
>
   ... Happiness in belief ... (I know it from Bible but I it is well-known
phrase in Russia and I can't find translation to English).

   You would not have a problem with Latin1 countries like West Europe.
The Oriental countries with hieroglyphs have many another problems
and change of coding probably (I am not specialist in Chinese or so)
has minor significance. But there are Russian and a lot of alphabet
languages, which can (CAN !) fit the second half of ASCII.
Hey, the transmision to variable-length code should have hard problems
in word-processing software. In software which does not intended for
word processing itself it should be especially painful. It is like "grep"
and other rare pattern-related or text-positioned programs.
The price of it too high for this language-community. That community
also want simple way to process bytes of native language as it can be done
with ASCII/Latin1. MIME gives this - it is possible to extend ASCII and named
this 8-bit code somehow for network transmission, but UTF-8 don't give it.

   In Russia in time of network establishing there are two groups of
people who were fans of two different coding - one KOI8 and other Unicode.
The first won due to second didn't write a working software.

>I guess a good software implementor will have some fallback for the
>case when there is no MIME headers, but the text is obviously not
>UTF-8. For instance, he could opt to present the data as-is, and
>hope that sender and receiver is using the same character set. This
>should probably not be in the RFC, but only leave this case undefined.

  It can't work for exam for oriental languages in EUC - there are at least
2.7% legal words which looks like UTF-8 and can suffer from implicit
conversion.

                                       - Leonid Yegoshin, LY22

P.S. Off-topic story but yet another argument.
      In early days/years after communist revolution in Russia
      there are some people which wanted a transfer Russian to latin
      alphabet. This attempt was failed but not due to lack of efforts
      and energy of supporters - the head of this movement had name
      Vladimir Ilyich Ulyanov (LENIN). A conversion whole country
      to communism showed a bit simple...


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.