From: Leonid Yegoshin (egoshin@genesyslab.com)
Date: Fri Nov 21 1997 - 19:19:56 CST
Hi,
>From: Chris Newman <Chris.Newman@innosoft.com>
>
>(1) Use UTF-8 only (at least for Newsgroup names in NNTP -- names in the
>Newsgroups header probably need to be RFC 2047 for news->mail gateways)
>
>This is simpler on modern operating systems like Windows NT, MacOS 8,
>Solaris 2.6, etc. On these systems it can be displayed with a trivial
>algorithmic conversion that's in the system library. And this works not
>just for the local characters, but for any characters with glyphs in your
>current font.
As for Solaris I found only iconv* series of convertors.
Unfortunately then I tryed to show some UTF-8 string, I met a problem -
- that destination code I should ask ? The item "local characters"
and "current font" is something strange for X11: this network protocol
is constructed to show any character sets, you need to only name it.
I can't ask convert arbitrary UTF-8 to my current KOI8 character set,
I fetch only KOI8 characters from it and lost for exam Korean instead of
show it. I _have_ Korean fonts on my host.
It is true for any X11 based system, and X11 system is ideal for MIME
representation. You can simple use character set (font !) from
name of MIME coding and it is work today, most of X11 fonts have names
which is compatible with MIME names !
As for Windows/NT... I am not specialist in Windows, but my computer
has Windows/NT 4.0 and I never saw some UTF-8-coded strings on it,
and one my friend say me that I can't (I don't understand his arguments,
he say something like "if you want to use Japanese in UTF-8 you should have
special version of Windows for Japan in addition to Japanese fonts")
>(2) Use MIME header encoding (RFC 2047)
>
>This has the advantage that any proper news client has to support this
>anyway (unless it is believed possible to force all mail->news gateways
>to do the conversion to UTF-8 in headers). For older operating systems,
>no conversion is necessary to view the localized character set. It also
>allows a graceful transition to UTF-8.
>
>Unfortunately, this is very ugly for non-upgraded clients. In addition,
>an international version of a news client will have to support conversion
>tables between all international character sets. The best way to keep
>these tables small is to use Unicode as an intermediate format -- at which
>point there is no savings over (1). If you don't use Unicode as an
>intermediate format, then these tables are significantly larger than the
>tables in (1). The conversion process is also more complex as some
>localized character sets use complex switching algorithms.
You is not right here. You need only to _set_ correct character set
from MIME name. (It is for X11).
>From: chl@clw.cs.man.ac.uk (Charles Lindsey)
>
>News reading software will need modification if it needs to read
>newsgroups using non-ascii character sets. At present, there are no such
>newsgroups (not legitimately at least) because RFC1036 does not permit it.
>But even without modification, newsgroups in the iso8859-1 charset still
>turn up as printable characters even when coded in UTF-8. Don't know
>whether that applies to iso8859-cyrillic (whichever that is).
(iso8859-5)
It is true only for Latin1 chars in UTF-8. Other languages have variable
size of characters and can't be processed by simple way.
It is very offensive for languages which can be fit in upper half of 8-bit.
And of course, MIME with possibility to represent any code is fan
for us. We can use our KOI8 (it is used in any way in Russia) without
trouble for international community. But UTF-8...
>Yes, a lot of 8bit stuff (usually interpreted as iso8859-1) does get
>through at present, but more by accident than by design. The correct way
>to legitimise such traffic (as our draft will make clear at some point)
>will be to include
>Content-Transfer-Encoding: 8bit
>Content-Type: text/plain charset=iso8859-1 (or whatever)
>
>in every message. But that only works for bodies.
(subjects)
It can work also for other non-ASCII parts of headers !
>From Claus Andre Faerber
>Leonid Yegoshin <egoshin@genesyslab.com> schrieb:
>> >For other headings (which may have been gatewayed from email) the MIME
>> >form (RFC2047) is permitted, but deprecated. UTF-8 is the default
>> >otherwise. Even then, news software MUST NOT translate it (until it
>> >arrives at the newsreader for display, that is).
>> >
>> I don't agree with it. With this requerement news process software
>> MUST have UTF-8 understanding, and additional headache. SMTP/MIME allow _any_
>> coding in any headers.
>
>Wrong. Only the text fields, i.e. the character sequences that are to be
>displayed to and entered by the user, may be encoded this way.
>Everything else, e.g. the email address, the date, the received lines,
>etc. MUST NOT be encoded this way.
I don't found Newsgroup: in E/SMTP specification as part of mail-processing.
It is intended that Newsgroup: can be inserted by user, and all known me
mailers do it under explicit user request and don't control the format
of it as so as any arbitrary header line.
>The default without a specification should be ``any interpretation the
>UA thinks is most useful'' based on the users locale, config settigs,
>heuristics, etc.
Yes. And for MTA it is possible to provide it only if it send body
without change. Default to assume body as UTF-8 can harm it.
>From: sommar@algonet.se (Erland Sommarskog)
>Leonid Yegoshin <egoshin@genesyslab.com> writes:
>> If "a relayer does not need to know much" and "It
>>should just passed the octets around", then we can expect sometime
>>to see in active file the newsgroup named
>>
>> relcom.=?koi8-r?Q?=C0=CD=CF=D2?= (prev relcom.humor)
>
>Why should we? Charles's draft does not propose this, and no news-
>reader today that I know of mangles the Newsgroups line in this
>way. Subject and From may get this treat, but for some reason even
>the most zealous RFC2047 fans have saved Newsgroups.
Erland, if .... and ... we CAN (not should). Due to lack of MIME-UTF-8
conversion in relayer.
>It is simply not an issue.
First versions of MIME did not support 8-bit body. "It was not an issue".
I try to explain that if there is request for use MIME then MIME will be used.
And I try to explain that THERE IS the request. Due to convenience for
a lot of non-Latin1 countries.
>I don't know if you are trying to make a general argument against UTF-8
>here. That seems to be too late. As I understand, IETF seems quite
>committed to UTF-8.
Yes, I don't like UTF-8. And I see many RFCs which was/is not implemented
in programs. Or implemented and not used. Credability of any IETF meeting
about languages/codes without representation of Russian-spoken network
programmers is near zero for me.
>Yes, you can make it decently well with KOI-8, iso-latin-1 and all
>that, but only as long you are only in one of these communities.
>But in the long run, it is a limited solution.
MIME is not limited solution. It is universal solution for any codes.
>Chris Newman <Chris.Newman@innosoft.com> writes:
>>(1) Use UTF-8 only (at least for Newsgroup names in NNTP -- names in the
>>Newsgroups header probably need to be RFC 2047 for news->mail gateways)
>
>Which is a good argument for encapsulating the article in the bpdy
>of the mail, and not mixing up news and mail headers. Any attempt
>to mangle the Newsgroup header is likely to cause problems, although
>the impacts are likely to be smaller in this case.
It is out of real live. There are a lot of mail-news gateways which
is used something from 20000 up to 40000 men. This mail-news gateways
(or more significant - mail agents on other end) use simple mail and
not encapsulated.
- Leonid Yegoshin, LY22