Re: 8-bit in newsgroups names

New Message Reply About this list Date view Thread view Subject view Author view

From: Claus André Färber (usenet-format-list@faerber.muc.de)
Date: Fri Nov 21 1997 - 06:49:00 CST


Leonid Yegoshin <egoshin@genesyslab.com> schrieb:
> >From: chl@clw.cs.man.ac.uk (Charles Lindsey)
> >
> >Not so, because MIME-encoded Newsgroup: headers (RFC2047-style) are not
> >permitted by the proposal, so UTF-8 is all you have got. If your
> >
> !??? Where is this proposal ? I read http://www.landfield.com/usefor/
> and found 2 drafts and read boths. I don't see this in it.
> I don't look this in this mailing list from IETF announce date.
> That we discuss as IETF WG ?
>
> >For other headings (which may have been gatewayed from email) the MIME
> >form (RFC2047) is permitted, but deprecated. UTF-8 is the default
> >otherwise. Even then, news software MUST NOT translate it (until it
> >arrives at the newsreader for display, that is).
> >
> I don't agree with it. With this requerement news process software
> MUST have UTF-8 understanding, and additional headache. SMTP/MIME allow _any_
> coding in any headers.

Wrong. Only the text fields, i.e. the character sequences that are to be
displayed to and entered by the user, may be encoded this way.
Everything else, e.g. the email address, the date, the received lines,
etc. MUST NOT be encoded this way.

The following is ILLEGAL:
| From: =?iso-8859-1?q?Claus_Andr=E9_F=E4rber_<claus@faerber.muc.de>?=

> And news-processing software should convert
> MIME-encoding Newsgroup: to UTF-8 ... Why ?

Today, implementations MUST NOT try to encode Newsgroups at all.

> But I don't see any arguments for UTF-8-only encoding, yet...
>
> >If there is no Content-Type header at all, then the body SHOULD be in
> >US-ASCII. But actually, the proposed default is UTF-8, since it does no
> >harm to be that way.
>
> You are in error here. Long before MIME there was and there is software
> which can send/recieve 8-bit mails/news without MIME-headers,
> for exam in Russia.

> Defaulting 8-bit body as UTF-8 is error, look at RFC1428.
> The reason for creating this RFC is not occasional.

True, as is Defaulting 7bit to us-ascii. Well, specifying the charset is
a good thing in any case. Requiring a ``Content-Type: text/plain;
charset=utf-8'' header is acceptable.

The default without a specification should be ``any interpretation the
UA thinks is most useful'' based on the users locale, config settigs,
heuristics, etc.

> The defaulting to UTF-8 should break "8-bit clean" (see RFC2130,
> Appendix A-1, "NetNews Messages").

-=-=-=-=-=-=-=-=-=-=- begin quote -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
| NetNews
|
| NNTP
| See 8.2. No strong tradition for negotiation of encoding in NNTP
| exists.
|
| NetNews Messages
| These should be able to leverage off the mechanisms defined for
| Email. One difference is that nearly all NNTP channels are 8-
| bit clean; some NNTP newsgroups have a tradition of using 8-bit
| charsets in both headers and bodies. Defining character set
| default on a per newsgroup basis might be a suitable approach.
-=-=-=-=-=-=-=-=-=-=- end quote -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Using 8bit in bodies is fine, as the actual charset can be labelled.

Using 8bit in headers is dangerous, as there's no possibility to specify
the charset. Using a per-group or even a per-hierarchy default will fail
with crosspostings.

Using UTF-8 allows the representation of any character set without
having to specify a charset, while software MAY of course use heuristics
to determine whether the interpretation as UTF-8 is useful and if not
which charset it might be instead.

-- 
Claus Andre Faerber <http://www.muc.de/~cfaerber/> Fax: +49-8061-3361
PGP: ID=1024/527CADCD FP=12 20 49 F3 E1 04 9E 9E  25 56 69 A5 C6 A0 C9 DC


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.