Encoded newsgroup-names

New Message Reply About this list Date view Thread view Subject view Author view

From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Thu Jul 04 2002 - 10:24:34 CDT


I think we are agreed on the encoding to be used for newsgroup names
when emailing to moderators and other gatewaying applications. We may
not yet be agreed on exactly when this encoding is to be used, or how it
stands alongside encapsulation, but I think the important first step is
to get the algoruthm properly described, so here is some text.

First, the last paragraph of 5.5 gets simplified:

   The Newsgroups-header is intended for use in Netnews articles rather
   than in email messages. It MAY be used in an email message to
   indicate that it is a copy also posted to the listed newsgroups, in
   which case the inclusion of a Posted-And-Mailed header (6.9) would
   also be appropriate. However, it SHOULD NOT be used in an email-only
   reply to a Netnews article (thus the "inheritable" property of this
   header applies only to followups to a newsgroup, and not to followups
   to the poster). Moreover, if a newsgroup-name contains any non-ASCII
   character, it may need to be encoded using the mechanism defined in
   section 5.5.2. See also the further discussion in section 8.8.1.
   
Then there is a new section 5.5.2.

5.5.21. Encoded newsgroup-names

   Where it is required to transport an article across some medium that
   cannot reliably convey the full 8 bits of each octet, such as when
   gatewaying it into Email (8.8.1), or when emailing it to a moderator
   or constructing the submission address of the moderator (8.2.2), it
   may be necessary to encode any newsgroup-name within its Newsgroups-
   header that contains any non-ASCII character. For that purpose, the
   following algorithm is provided:

   1. Initially, the newsgroup-name is in the form of a sequence of
      octets representing that name in the UTF-8 character set.

   2. Each octet in the name in the range 0x80-ff is replaced by a "%"
      character, followed by two characters representing that octet in
      hexadecimal, in which the hexadecimal digits "a" through "f" MUST
      be in lowercase.

   3. Each octet in the name in the range 0x00-7f remains unaltered (and
      thus MUST NOT be replaced by its hexadecimal equivalent).

        NOTE: Observe that this algorithm provides a unique encoding for
        each newsgroup-name. It will also be observed that it is
        compatible with (but more retrictive than) that provided in [RFC
        2396] for use within Uniform Resource Identifiers.

   This standard provides no authority for the use of this algorithm
   other than in the context of Newsgroups-headers being conveyed by
   email. In particular, it MUST NOT be used within any article conveyed
   by the Netnews protocols and thus, if an email using it is
   subsequently returned to the Netnews environment, it MUST be decoded
   back into UTF-8.

        NOTE: Although the encoding defined by [RFC 2047] is available
        for use with other headers containing non-ASCII characters, the
        Newsgroups-header, being a structured header, is not one of the
        contexts permitted for its use (and moreover it would not
        produce a unique encoding nor cope well with newsgroup-names of
        excessive length). Therefore it SHOULD NOT be used within the
        Newsgroups-header.
[Clearly there are consequential changes in the Duties of injecting
agnents and moderators, in gatewaying, and in the usage of
application/news-transmission.]

Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.