From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Thu Aug 12 1999 - 13:06:37 CDT
I am now back in normal operation, and have made some progress on
section_6. But in the meantime, to keep you happy, here is section_5.
Most of the changes arise directly from discussions on this list. I
still need to know, however, whether we will permit path-identities to
include '[' and ']'. I no longer encourage their use around IP addresses
(Russ asked for that) but does that mean we should forbid the practice?
There follows the new section, and then the diffs. It will appear on the
landfield site presently.
5. Mandatory Headers
An article MUST have one, and only one, of each of the following
headers: Date, From, Message-ID, Subject, Newsgroups, Path.
Note also that there are situations, discussed in the relevant
parts of section 6, where References, Sender, or Approved headers
are mandatory. In control articles, specific values are required
for certain headers.
For the overall syntax of headers, see section 4.1. In the
discussions of the individual headers, the content of each is
specified using the syntax notation. The convention used is that
the content of, for example, the Subject header is defined as
<Subject-content>.
A proto-article (see 7.1.1) may lack some of these mandatory
headers, but they MUST then be supplied by the injecting agent.
5.1. Date
The Date header contains the date and time that the article was
prepared by the poster ready for transmission and SHOULD express
the poster's local time. The content syntax makes use of syntax
defined in [MESSFOR].
Date-content = date-time
NOTE: It is a useful convention to follow the date-time with a
comment containing the time zone in human-readable form. The
use of folding in a date-time is deprecated, even though
permitted by [MESSFOR].
5.1.1. Examples
Date: Fri, 2 Apr 1999 20:20:51 -0500 (EST)
Date: 26 May 1999 16:13 +0000
5.2. From
The From header contains the electronic address(es), and possibly
the full name, of the article's author(s). The content syntax makes
use of syntax defined in [MESSFOR], subject to the following
revised definition of local-part.
From-content = mailbox-list
addr-spec = local-part "@" domain
local-part = dot-atom / strict-quoted-string
NOTE: This syntax ensures that the local-part of an addr-spec
is restricted to pure US-ASCII (and is thus in strict
compliance with [MESSFOR]), whilst allowing any UTF-8
character to be used in a preceding quoted-string containing
the author's full name. If some future extension to the Mail
protocols should relax this restriction, one would expect the
Netnews protocols to follow.
Any mailbox in the From-content field that does not belong to the
poster(s) of the article MUST end in the top level domain of
".invalid" [RFC 2606] unless the owner(s) of those mailboxes have
authorized the poster(s) of the article to use those mailboxes.
5.2.1. Examples:
From: John Smith <jsmith@site.example>
From: "John Smith" <jsmith@site.example>, dave@isp.example
From: "John D. Smith" <jsmith@site.example>, andrew@isp.example,
fred@site2.example
From: Jan Jones <jan@please_setup_your_system_correctly.invalid>
From: Jan Jones <joe@anonymous.invalid>
From: dave@isp.example (Dave Smith)
NOTE: the last example shows a now deprecated convention of
putting an author's full name in a comment following the
<mailbox>, rather than in a <phrase> at the start of that
mailbox. Observe that the quotes around the "John D. Smith"
example were required, on account of the '.' character, and
they would also have been required had any UTF8-xtra-char been
present.
5.3. Message-ID
The Message-ID header contains the article's message identifier, a
unique identifier distinguishing the article from every other
article. The content syntax makes use of syntax defined in
[MESSFOR], subject to the following revised definition of no-fold-
quote.
Message-ID-content = msg-id
id-left = dot-atom-text / no-fold-quote
no-fold-quote = DQUOTE *( strict-qtext / strict-quoted-pair )
NOTE: This syntax ensures that a msg-id is restricted to
pure US-ASCII (and is thus in strict compliance with
[MESSFOR]).
Following the provisions of [MESSFOR], an agent generating an
article's message identifier MUST ensure that it is unique and that
it is NEVER reused. Moreover, even though commonly derived from the
domain name of the originating site (and domain names are case-
insensitive), a message identifier MUST NOT be altered in any way
during transport, or when copied (as into a References header), and
thus a simple (case-sensitive) comparison of octets will always
suffice to recognise that same message identifier wherever it
subsequently reappears.
NOTE: some old software may treat message identifiers that
differ only in case within their id-right part as equivalent,
and implementors of agents that generate message identifiers
should be aware of this.
5.4. Subject
The Subject header contains a short string identifying the topic of
the message. This is an inheritable header (see ...) to be copied
into the Subject header of any followup, in which case the new
header-content SHOULD then start with the string "Re: " (a "back
reference") followed by the contents of the pure-subject of the
precursor. Any leading "Re: " in the pure-subject MUST be stripped.
Subject-content = [ back-reference ] pure-subject
nbtext = qtext / "\" / DQUOTE
; all of <text> except SP and HTAB
pure-subject = 1*( [FWS] nbtext )
back-reference = %x52.65.3A.20
; which is a case-sensitive "Re: "
The pure-subject MUST NOT begin with "Re: ".
NOTE: The given syntax differs from that prescribed in
[MESSFOR] insofar as it does not permit a header content to be
completely empty, or to consist of WSP only (see remarks in
... concerning undesirable headers).
Followup agents MAY remove instances of non-standard back-reference
(such as "Re(2): ", "Re:", "RE: ", or "Sv: ") from the Subject-
content when composing the subject of a followup and add a correct
back-reference in front of the result.
NOTE: that would be "SHOULD remove instances" except that we
cannot find a sufficiently robust and simple algorithm to do
the necessary natural language processing.
Followup agents MUST NOT use any other string except "Re: " as a
back reference. Specifically, a translation of "Re: " into a local
language or usage MUST NOT be used.
NOTE: "Re" is an abbreviation for the Latin "In re", meaning
"in the matter of", and not an abbreviation of "Reference" as
is sometimes erroneously supposed.
Agents SHOULD NOT depend on nor enforce the use of back references
by followup agents. For compatibility with legacy news software the
Subject-content of a control message MAY start with the string
"cmsg ", non-control messages MUST NOT start with the string
"cmsg ".
5.4.1. Examples
In the following examples, please note that only "Re: " is mandated
by this standard. "was: " is a convention used by many English-
speaking posters to signal a change in subject matter. Software
should be able to deduce this information from References.
Subject: Film at 11.
Subject: Re: Film at 11
Subject: Godwin's law considered harmful (was: Film at 11)
Subject: Godwin's law (was: Film at 11)
Subject: Re: Godwin's law (was: Film at 11)
5.5. Newsgroups
The Newsgroups header's content specifies which newsgroup(s) the
article is posted to. It is an inheritable header (see ...) which
SHOULD be copied into the Newsgroups header of any followup, unless
a Followup-To header is present to prescribe otherwise.
Newsgroups-content = newsgroup-name
*( *FWS ng-delim *FWS newsgroup-name ) *FWS
newsgroup-name = component *( "." component )
component = component-start
*( component-start / component-other )
component-start = Un-lowercase / Un-digit
Un-lowercase = <Unicode Letter, Lowercase> /
<Unicode Letter, Other>
Un-digit = <Unicode Number, Decimal Digit> /
<Unicode Number, Other>
component-other = "+" / "-" / "_"
ng-delim = ","
where the <Unicode ...> items are as described in [UNICODE].
The inclusion of folding white space within a newsgroup-name is a
newly introduced feature in this standard. It MUST be accepted by
all conforming implementations (relaying agents, serving agents and
reading agents). Posting agents should be aware that such postings
may be rejected by overly-critical old-style relaying agents. When
a sufficient number of relaying agents are in conformance, posting
agents SHOULD generate such whitespace in the form of <CRLF WS> so
as to keep the length of lines in the relevant headers (notably
Newsgroups and Followup-To) to no more than than 79 characters (or
other agreed policy limit - see 4.6). Before such critical mass
occurs, injecting agents MAY reformat such headers by removing
whitespace inserted by the posting agent, but relaying agents MUST
NOT do so.
[That is Dan's revision of the Year 2000 text. I have proposed a
revised text in section 3 to match it.]
A newsgroup-name consists of one or more components. Components MAY
contain non-ASCII letters, but these MUST be encoded in UTF-8 and
not according to [RFC 2047]. A component MUST contain at least one
letter (and MUST, according to the syntax, begin with a letter or
digit). Components SHOULD begin with a letter. Composite
characters (made by overlaying one character with another) and
format characters, as allowed in certain parts of Unicode and
needed by certain languages, must use whatever canonical
conventions apply to those parts of Unicode (such conventions are
not defined in this Standard). The use of "_" in a component is
deprecated. Serving agents MAY refuse to accept newsgroups using
that component.
NOTE: Components composed entirely of digits would cause
problems for the commonly used implementation technique of
using the component as the name of a directory, whilst also
using sequential numbers to distinguish the articles within a
group. Components containing other non-permitted characters
could cause problems when newsgroup-names appear in URLs [RFC
1738] (for example an '@' character would prevent
distinguishing between newsgroup-names and message
identifiers).
NOTE: According to the syntax, uppercase letters cannot occur
in newsgroup-names, but this standard imposes no requirement
on software to check this condition, since it would be
unreasonable to expect it to do so in parts of Unicode for
which it was not configured (in general, a table lookup is
required). Rather, it is the responsibility of those creating
new newsgroups (...) not to violate it, It is, moreover, to be
expected that a newsgroup created in violation of this
condition will not be propagated particularly well.
[And insert some more on this subject when we come to the newgroup
control message.]
Whilst there is no longer any technical reason to limit the length
of a component (formerly, it was limited to 14 characters) nor to
limit the total length of a newsgroup-name, it should be noted that
these names are also used in the newsgroups line (6.6.1.2) where an
overall limit applies, and moreover excessively long names can be
exceedingly inconvenient in practical use. Agencies responsible
for individual hierarchies SHOULD therefore, as a matter of policy,
set reasonable limits for the length of a component and of a
newsgroup-name. In the absence of such explicit policies, the
default figures are 30 characters and 71 characters respectively.
[Observe that I have restored "20" to "30", since that seems to
express the intersection of our agreements on this matter.]
[If the checkpolicies proposal is included in the Standard, there
should be a reference to it here.]
NOTE: The newsgroup-name as encoded in UTF-8 should be
regarded as the canonical form. Reading agents may convert it
to whatever character set they are able to display (see 4.5.2)
and serving agents may possibly need to convert it to some
form more suitable as a filename. Simple algorithms for both
kinds of conversion are readily available. Observe that the
syntax does not allow comments within the Newsgroups header;
this is to simplify processing by relaying and serving agents
which have a requirement to process this header extremely
rapidly.
Posters SHOULD use only the names of existing newsgroups in the
Newsgroups header. However, it is legitimate to cross-post to
newsgroup(s) which do not exist on the posting agent's host,
provided that at least one of the newsgroups DOES exist there, and
followup agents MUST accept this (posting agents MAY accept it, but
SHOULD at least alert the poster to the situation and request
confirmation). Relaying agents MUST NOT rewrite Newsgroups headers
in any way, even if some or all of the newsgroups do not exist on
the relaying agent's host.
The Newsgroups header is intended for use in Netnews articles
rather than in mail messages. It MAY be used in a mail message to
indicate that it is a copy also posted to the listed newsgroups,
but it SHOULD NOT be used in a mail-only reply to a Netnews article
(thus the "inheritable" property of this header applies only to
followups to a newsgroup, and not to followups to the poster).
Moreover, if a newsgroup-name contains any non-ASCII character, it
MAY be encoded using the mechanism defined in [RFC 2047] when sent
by mail but, if it is subsequently returned to the Netnews
environment, it MUST then be re-encoded into UTF-8.
[We discussed the conflicting interpretations of the Newsgroup header
in mail. What I have proposed will make 50% of users happy and annoy
the other 50%, but that is better than confusing 100%. I do not expect
my "SHOULD NOT" to be universally observed for some considerable
time.]
5.5.1. Forbidden newsgroup names
The following forms of newsgroup-name MUST NOT be used except for
the specific purposes indicated:
o Newsgroup-names having only one component. These are reserved
for newsgroups whose propagation is restricted to a single host
or local network, and for pseudo-newsgroups such as "poster"
(which has special meaning in the Followup-To header - see
section 6.1), "junk" (often used by serving agents) and
"control" (likewise)
o Any newsgroup-name beginning with "control." (used as pseudo-
newsgroups by many serving agents)
o Any newsgroup-name containing the component "ctl" (likewise)
o "to" or any newsgroup-name beginning with "to." (reserved for
the ihave/sendme protocol described in section 7.2, and for
test messages sent on an essentially point-to-point basis)
o Any newsgroup-name containing the component "all" (because this
is used as a wildcard in some implementations)
A newsgroup-name SHOULD NOT appear more than once in the Newsgroups
header. The order of newsgroup names in the Newsgroups header is
not significant, except for determining which moderator to send the
article to if one of the groups is moderated (see 7.1.2).
5.6. Path
The Path header shows the route taken by a message since its entry
into the USENET system. It is a variant header (see 4. ...), each
agent that processes an article being required to add one (or more)
entries to it. This is primarily to enable relaying agents to avoid
sending articles to sites already known to have them, in particular
the site they came from, and additionally to permit tracing the
route articles take in moving over the network, and for gathering
USENET statistics. Finally the presence of a "%" delimiter in the
Path header can be used to identify an article injected in
conformance with this standard.
5.6.1. Format
Path-content = *( path-identity [FWS] delimiter [FWS] )
tail-entry *FWS
path-identity = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
delimiter = "/" / "?" / "%" / "," / "!"
tail-entry = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
NOTE: A Path-content will inevitably contain at least one
path-identity, except possibly in the case of a proto-article
that has not yet been injected onto the network.
NOTE: Observe that the syntax does not allow comments within
the Path header; this is to simplify processing by relaying
and injecting agents which have a requirement to process this
header extremely rapidly.
A relaying agent SHOULD NOT pass an article to another relaying
agent whose path-identity (or some known alias thereof) already
appears in the Path-content. The comparison MAY be either case
sensitive or case insensitive, and therefore relaying agents MUST
NOT generate a name which differs from that of another site only in
terms of case.
A relaying agent MAY decline to accept an article if its own path-
identity is already present in the Path-content or if the Path-
content contains some path-identity whose articles the relaying
agent does not want, as a matter of local policy.
NOTE: This last facility is sometimes used to detect and
decline control messages (notably cancel messages) which have
been deliberately seeded with a path-identity to be "aliased
out" by sites not wishing to act upon them.
5.6.2. Adding a path-identity to the Path header
When an injecting, relaying or serving agent receives an article,
it MUST prepend its own path-identity followed by a delimiter to
the beginning of the Path-content. In addition, it SHOULD then add
CRLF and WSP if it would otherwise result in a line longer than 79
characters.
[It now seems established that none of the major servers and relayers
has any problem with folding the Path line, and that none of them
barfs on the new delimiters (the worst that can happen being that an
article is offered to a site that already has it).]
The path-identity added MUST be unique to that agent. To this end
it SHOULD be one of:
1. A fully qualified domain name (FQDN) associated (by the Internet
DNS service [RFC 1034]) with an A record, which SHOULD identify
the actual machine prepending this path-identity. Ideally, this
FQDN should also be "mailable" in the sense that it enables the
construction of a valid E-mail address of the form
"usenet@<FQDN>" or "news@<FQDN>" [RFC 2142] whereby the
administrators of that agent may be reached.
2. A fully qualified domain name (FQDN) associated (by the Internet
DNS service) with an MX record which MUST enable the
construction of a valid E-mail address of the form
"usenet@<FQDN>" or "news@<FQDN>" whereby the administrators of
that agent may be reached.
3. A name registered previously in the UUCP maps database (found in
the newsgroup comp.mail.maps), containing no '.' character.
4. An encoding of an IP address - <dotted-quad> [RFC 820] or
<ipv6-numeric> [RFC 2373] (the requirement to be able to use an
<ipv6-numeric> is the reason for including ':' as an allowed
character within a path-identity).
[Possibility of [...] around the IP address removed at Russ's request.
Actually the syntax did not permit '[' and ']' in a path-identity, but
I could easily make it do so, and I doubt any harm would ensue.]
5. A '.' followed by an arbitrary name not in the UUCP maps
database, but believed to be unique and registered at least with
all sites immediately downstream from the given site.
Of the above options, nos. 1 to 3 are much to be preferred, unless
there are strong technical reasons dictating otherwise. In
particular, the injecting agent's path-identity MUST, as a special
case, be an FQDN mailable in the sense defined under option 1, or
with an associated MX record as in option 2, and it MUST be
followed by the special delimiter '%' which serves to separate the
pre-injection and post-injection regions of the Path-content. See
the Duties of an Injection Agent (section 7.1). In the case of a
relaying or serving agent, the delimiter is chosen as follows.
When an agent (other than an injecting agent) receives an article,
it MUST establish the identity of the source and compare it with
the leftmost path-identity of the Path-content. If it matches, a
'/' should be used as the delimiter when prepending the agent's own
path-identity. If it does not match then the agent should prepend
two entries to the Path-content; firstly the true established
path-identity of the source followed by a "?" delimiter, and then,
to the left of that, the agent's own path-identity followed by a
'/' delimiter as usual. This prepending of two entries SHOULD NOT
be done if the provided and established identities match.
[I have upgraded that "MUST" from the previous "SHOULD (and eventually
MUST)". I can see no benefit in not being firm from the start. Of
course, it will not be implemented on day 1. Contrariwise, I have
demoted that "SHOULD NOT" from a "MUST NOT" since nothing actually
breaks if you do it, though it is clearly a lazy behaviour and
potentially doubles the length of the Path line.]
Any method of establishing the identity of the source may be used
(but see ... below), with the consideration that, in the event of
problems, the agent concerned may be called upon to justify it.
NOTE: The use of the '%' delimiter marks the position of the
injecting agent in the chain. In normal circumstances there
should therefore be only one `%` delimiter present, and
injecting agents MAY choose to reject proto-articles with a
'%' already in them. If, for whatever reason, more than one
'%' is found, then the path-identity in front of the leftmost
'%' is to be regarded as the true injecting agent.
5.6.3. The tail-entry
For historical reasons, the tail-entry (i.e. the rightmost entry in
the Path-content) is regarded as a "user name", and therefore MUST
NOT be interpreted as a site through which the article has already
passed. Moreover, the Path-content is not an E-mail address and
MUST NOT be used to contact the poster. Posting and/or injecting
agents MAY place any string here. When it is not an actual user
name, the string "not-for-mail" is often used, but in fact a simple
"x" would be sufficient.
Often this field will be the only entry in the region (known as the
pre-injection region) after the '%', although there may be entries
corresponding to machines traversed between the posting agent and
the injecting agent proper. In particular, injecting agents that
receive articles from many sources SHOULD include the identity of
the source machine connecting to do the injection, and possibly
other information enabling them to establish the circumstances of
the injection (provided it does not conflict with any genuine site
identifier). The '!' delimiter may be used freely within the pre-
injection region, although '/' and '?' are also appropriate if used
correctly.
[If/when we invent some form of Trace or NNTP-Posting-Host header, we
may want to revisit that paragraph.]
5.6.4. Delimiter Summary
A summary of the various delimiters. The name immediately to the
left of the delimiter is always that of the machine which added the
delimiter.
'/' The name immediately to the right is known to be the identity
of the machine from which the article was received (either
because the entry was made by that machine and we have verified
it, or because we have added it ourselves).
'?' The name immediately to the right is the claimed identity of
the machine from which the article was received, but we were
unable to verify it (and have prepended our own view of where
it came from, and then a '/').
'%' Everything to the right is the pre-injection region followed by
the tail-entry. The name on the left is the FQDN of the
injecting agent. The presence of two '%'s in a path indicates a
double-injection (see ...).
'!' The name immediately to the right is unverified. The presence
of a '!' to the left of the '%' indicates that the identity to
the left is that of an old-style system not conformant with
this standard.
',' Reserved for future use, treat as '/'.
Other
Old software may possibly use other delimiters, which should be
treated as '!'. But note in particular that ':', '-' and '_'
are components of names, not delimiters, and FWS on its own
MUST NOT be used as the sole delimiter.
[I just removed '[' and ']' from that list, but could be persuaded to
put them back so long as the syntax gets fixed at the same time.]
NOTE: Old Netnews relaying and injecting programs almost all
delimit Path entries with the "!" delimiter, and these entries
are not verified. As such, the presence of "%" as a delimiter
will indicate that the article was injected by software
conforming to this standard, and the presence of "!" as a
delimiter to the left of a '%' will indicate that the message
passed through systems developed prior to this standard. It is
anticipated that relaying agents will reject articles in the
old style once this new standard has been widely adopted.
5.6.5. Suggested Verification Methods
The following approaches for common transports are suggested in
order to meet a site's verification obligations. They are not
required, but following them should avoid the necessity for
wasteful double-entry Path additions.
If the incoming article arrives through some TCP/IP protocol such
as NNTP, the IP address of the source will be known, and will
likely already have been checked against a list of known FQDNs or
IP addresses that the receiving site has agreed to peer with (this
will have involved a DNS lookup of a known FQDN, following CNAME
chains as required, to find an A record containing that source IP).
1. Where the path-identity is an FQDN (or even an arbitrary name
starting with a '.') it is now a simple matter to check that it
is the proper FQDN for the source, or some known registered
alias thereof. Alternatively, where the FQDN in the path-
identity has an associated A record, an immediate DNS lookup as
above can be used to verify it.
2. Where the path-identity is an encoding of an IP address which
does not immediately match the known IP address of the source, a
reverse-DNS (in-addr.arpa PTR record) lookup may be done on the
provided address, followed by a regular DNS "A" record lookup on
the returned name. There may be A records for several IP
addresses, of which one should match the path-identity and
another should match the source.
3. If the path-identity fails to match any known alias for the
source (requiring the insertion of an extra path-identity for
the true source followed by a '?'), simply doing a reverse DNS
(PTR) lookup on the source IP address is not sufficient to
generate the true FQDN. The returned name must be mapped back to
A records to assure it matches the source's IP address.
If the incoming article arrives through some other protocol, such
as UUCP, that protocol MUST include a means of verifying the source
site. In UUCP implementations, commonly each incoming connection
has a unique login name and password, and that login name (or some
alias registered for it) would be expected as the path-identity.
[The above description may still contain more detail that we would
wish. My aim so far was to retain everything in Brad's original, but
expressed in a more palatable manner. We can now decide how much of it
we want to keep.]
5.6.6. Example
Path: foo.isp.example/
.foo-server/bar.isp.example?10.123.12.2/old.site.example!
barbaz/baz.isp.example%dialup123.baz.isp.example!x
NOTE: That article was injected into the news stream by
baz.isp.example (complaints may be addressed to
usenet@baz.isp.example). The injector has taken care to record
that it got it from dialup123.baz.isp.example. "x" is the
default tail entry, though sometimes a real userid is put
there.
The article was relayed, perhaps by UUCP, to the machine known
in the UUCP maps database as "barbaz".
Barbaz relayed it to old.site.example, which does not yet
conform to this standard (hence the '!' delimiter). So one
cannot be sure that it really came from barbaz.
Old.site.example relayed it to a site claiming to have the IP
address [10.123.12.2], and claiming (by using the '/'
delimiter) to have verified that it came from
old.site.example.
[10.123.12.2] relayed it to ".foo-server" which, not being
convinced that it truly came from [10.123.12.2], did a reverse
lookup on the actual source and concluded it was known as
bar.isp.example (that is not to say that [10.123.12.2] was not
a correct IP address for bar.isp.example, but simply that that
connection could not be substantiated by .foo-server).
Observe that .foo-server has now added two entries to the
Path.
".foo-server" is a locally significant name (observe the
presence of the '.') within the complex site of many machines
run by foo.isp.example, so the latter should have no problem
recognizing .foo-server and using a '/' delimiter. Presumably
foo.isp.example then delivered the article to its direct
clients.
It appears that foo.isp.example and old.site.example decided
to fold the line, on the grounds that it seemed to be getting
a little too long.
[MESSFOR] P. Resnick, "Internet Message Format Standard", draft-
ietf-drums-msg-fmt-07.txt, March 1998.
[RFC 1034] P. Mockapetris, "Domain Names - Concepts and
Facilities", RFC 1034, November 1987.
[RFC 1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform
Resource Locators (URL)", RFC 1738, December 1994.
[RFC 2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions)
Part Three: Message Header Extensions for Non-ASCII Text", RFC
2047, November 1996.
[RFC 2142] D. Crocker, "Mailbox Names for Common Services, Roles
and Functions", RFC 2142, May 1997.
[RFC 2373] R. Hinden and S. Deering, "IP Version 6 Addressing
Architecture", RFC 2373, July 1998.
[RFC 2606] D. Eastlake and A. Panitz, "Reserved Top Level DNS
Names", RFC 2606, June 1999.
[RFC 820] J. Postel and J. Vernon, "Assigned Numbers", RFC 820,
January 1983.
[UNICODE] The Unicode Consortium, "The Unicode Standard - Version
2.0", Addison-Wesley, 1996.
chl% diff -C 2 section_5.02.02 section_5.02.03
*** section_5.02.02 Fri Jul 2 19:50:56 1999
--- section_5.02.03 Thu Aug 12 18:57:05 1999
***************
*** 16,22 ****
<Subject-content>.
! A proto-article (see 7.1.1) may lack some of these mandatory
! headers, but they MUST then be supplied by the injecting
! agent.
5.1. Date
--- 16,21 ----
<Subject-content>.
! A proto-article (see 7.1.1) may lack some of these mandatory
! headers, but they MUST then be supplied by the injecting agent.
5.1. Date
***************
*** 24,29 ****
The Date header contains the date and time that the article was
prepared by the poster ready for transmission and SHOULD express
! the poster's local time. The content syntax is as defined in
! [MESSFOR].
Date-content = date-time
--- 23,28 ----
The Date header contains the date and time that the article was
prepared by the poster ready for transmission and SHOULD express
! the poster's local time. The content syntax makes use of syntax
! defined in [MESSFOR].
Date-content = date-time
***************
*** 42,48 ****
The From header contains the electronic address(es), and possibly
! the full name, of the article's author(s). The content syntax is as
! defined in [MESSFOR], subject to the following revised definition
! of local-part.
From-content = mailbox-list
--- 41,47 ----
The From header contains the electronic address(es), and possibly
! the full name, of the article's author(s). The content syntax makes
! use of syntax defined in [MESSFOR], subject to the following
! revised definition of local-part.
From-content = mailbox-list
***************
*** 58,65 ****
Netnews protocols to follow.
! All mailboxes in the From-content field MUST either belong to the
! posters(s) of the article (or the poster(s) are authorized by the
! owners to use the mailboxes) or end in the top level domain of
! ".invalid" [RFC 2606].
5.2.1. Examples:
--- 57,64 ----
Netnews protocols to follow.
! Any mailbox in the From-content field that does not belong to the
! poster(s) of the article MUST end in the top level domain of
! ".invalid" [RFC 2606] unless the owner(s) of those mailboxes have
! authorized the poster(s) of the article to use those mailboxes.
5.2.1. Examples:
***************
*** 85,90 ****
The Message-ID header contains the article's message identifier, a
unique identifier distinguishing the article from every other
! article. The content syntax is as defined in [MESSFOR], subject to
! the following revised definition of no-fold-quote.
Message-ID-content = msg-id
--- 84,90 ----
The Message-ID header contains the article's message identifier, a
unique identifier distinguishing the article from every other
! article. The content syntax makes use of syntax defined in
! [MESSFOR], subject to the following revised definition of no-fold-
! quote.
Message-ID-content = msg-id
***************
*** 93,97 ****
NOTE: This syntax ensures that a msg-id is restricted to
! pure US-ASCII.
Following the provisions of [MESSFOR], an agent generating an
--- 93,98 ----
NOTE: This syntax ensures that a msg-id is restricted to
! pure US-ASCII (and is thus in strict compliance with
! [MESSFOR]).
Following the provisions of [MESSFOR], an agent generating an
***************
*** 105,108 ****
--- 106,114 ----
subsequently reappears.
+ NOTE: some old software may treat message identifiers that
+ differ only in case within their id-right part as equivalent,
+ and implementors of agents that generate message identifiers
+ should be aware of this.
+
5.4. Subject
***************
*** 142,146 ****
NOTE: "Re" is an abbreviation for the Latin "In re", meaning
! "in the matter of", and not an abbreviation of "Reference" is
is sometimes erroneously supposed.
--- 148,152 ----
NOTE: "Re" is an abbreviation for the Latin "In re", meaning
! "in the matter of", and not an abbreviation of "Reference" as
is sometimes erroneously supposed.
***************
*** 179,184 ****
Un-lowercase = <Unicode Letter, Lowercase> /
<Unicode Letter, Other>
- Un-uppercase = <Unicode Letter, Uppercase> /
- <Unicode Letter, Titlecase>
Un-digit = <Unicode Number, Decimal Digit> /
<Unicode Number, Other>
--- 185,188 ----
***************
*** 190,204 ****
newly introduced feature in this standard. It MUST be accepted by
all conforming implementations (relaying agents, serving agents and
! reading agents). Posting agents should be aware that, except for
! experimental posting to 'test' newsgroups or within cooperating
! subnets, such postings may be rejected by overly-critical old-style
! relaying agents. When a sufficient number of relaying agents are in
! conformance, posting agents SHOULD generate such whitespace in the
! form of <CRLF WS> so as to keep the length of lines in the relevant
! headers (notably Newsgroups and Followup-To) to no more than than
! 79 characters (or other agreed policy limit - see 4.6). Before such
! critical mass occurs, injecting agents MAY reformat such headers by
! removing whitespace inserted by the posting agent, but relaying
! agents MUST NOT do so.
[That is Dan's revision of the Year 2000 text. I have proposed a
revised text in section 3 to match it.]
--- 194,207 ----
newly introduced feature in this standard. It MUST be accepted by
all conforming implementations (relaying agents, serving agents and
! reading agents). Posting agents should be aware that such postings
! may be rejected by overly-critical old-style relaying agents. When
! a sufficient number of relaying agents are in conformance, posting
! agents SHOULD generate such whitespace in the form of <CRLF WS> so
! as to keep the length of lines in the relevant headers (notably
! Newsgroups and Followup-To) to no more than than 79 characters (or
! other agreed policy limit - see 4.6). Before such critical mass
! occurs, injecting agents MAY reformat such headers by removing
! whitespace inserted by the posting agent, but relaying agents MUST
! NOT do so.
[That is Dan's revision of the Year 2000 text. I have proposed a
revised text in section 3 to match it.]
***************
*** 248,255 ****
set reasonable limits for the length of a component and of a
newsgroup-name. In the absence of such explicit policies, the
! default figures are 20 characters and 72 characters respectively.
! [Observe that I have reduced that "20" from "30", on the grounds that
! a particular hierarchy can always decide to up the limit, but no
! hierarchy is ever likely to reduce it.]
[If the checkpolicies proposal is included in the Standard, there
should be a reference to it here.]
--- 251,257 ----
set reasonable limits for the length of a component and of a
newsgroup-name. In the absence of such explicit policies, the
! default figures are 30 characters and 71 characters respectively.
! [Observe that I have restored "20" to "30", since that seems to
! express the intersection of our agreements on this matter.]
[If the checkpolicies proposal is included in the Standard, there
should be a reference to it here.]
***************
*** 267,272 ****
Posters SHOULD use only the names of existing newsgroups in the
! Newsgroups header, because newsgroups are not created simply by
! being posted to. However, it is legitimate to cross-post to
newsgroup(s) which do not exist on the posting agent's host,
provided that at least one of the newsgroups DOES exist there, and
--- 269,273 ----
Posters SHOULD use only the names of existing newsgroups in the
! Newsgroups header. However, it is legitimate to cross-post to
newsgroup(s) which do not exist on the posting agent's host,
provided that at least one of the newsgroups DOES exist there, and
***************
*** 295,299 ****
5.5.1. Forbidden newsgroup names
! The following newsgroup-names MUST NOT be used:
o Newsgroup-names having only one component. These are reserved
--- 296,301 ----
5.5.1. Forbidden newsgroup names
! The following forms of newsgroup-name MUST NOT be used except for
! the specific purposes indicated:
o Newsgroup-names having only one component. These are reserved
***************
*** 307,312 ****
o Any newsgroup-name containing the component "ctl" (likewise)
o "to" or any newsgroup-name beginning with "to." (reserved for
! test messages sent on an essentially point-to-point basis (see
! also the ihave/sendme protocol described in section 7.2)
o Any newsgroup-name containing the component "all" (because this
is used as a wildcard in some implementations)
--- 309,314 ----
o Any newsgroup-name containing the component "ctl" (likewise)
o "to" or any newsgroup-name beginning with "to." (reserved for
! the ihave/sendme protocol described in section 7.2, and for
! test messages sent on an essentially point-to-point basis)
o Any newsgroup-name containing the component "all" (because this
is used as a wildcard in some implementations)
***************
*** 333,341 ****
Path-content = *( path-identity [FWS] delimiter [FWS] )
! tail-entry
path-identity = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
! delimiter = "/" / "@" / "%" / "," / "!"
tail-entry = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
NOTE: Observe that the syntax does not allow comments within
the Path header; this is to simplify processing by relaying
--- 335,347 ----
Path-content = *( path-identity [FWS] delimiter [FWS] )
! tail-entry *FWS
path-identity = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
! delimiter = "/" / "?" / "%" / "," / "!"
tail-entry = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
+ NOTE: A Path-content will inevitably contain at least one
+ path-identity, except possibly in the case of a proto-article
+ that has not yet been injected onto the network.
+
NOTE: Observe that the syntax does not allow comments within
the Path header; this is to simplify processing by relaying
***************
*** 345,353 ****
A relaying agent SHOULD NOT pass an article to another relaying
agent whose path-identity (or some known alias thereof) already
! appears in the Path-content. Observe that, for purposes of
! comparison, path-identities are case-sensitive. A relaying agent
! MAY decline to accept an article if its own path-identity (or some
! alias thereof) is already present in the Path-content.
NOTE: This last facility is sometimes used to detect and
decline control messages (notably cancel messages) which have
--- 351,364 ----
A relaying agent SHOULD NOT pass an article to another relaying
agent whose path-identity (or some known alias thereof) already
! appears in the Path-content. The comparison MAY be either case
! sensitive or case insensitive, and therefore relaying agents MUST
! NOT generate a name which differs from that of another site only in
! terms of case.
+ A relaying agent MAY decline to accept an article if its own path-
+ identity is already present in the Path-content or if the Path-
+ content contains some path-identity whose articles the relaying
+ agent does not want, as a matter of local policy.
+
NOTE: This last facility is sometimes used to detect and
decline control messages (notably cancel messages) which have
***************
*** 370,401 ****
it SHOULD be one of:
! 1. A fully qualified domain name (FQDN) which MUST be associated
! with an A record or an MX record (or both), retrievable via the
! Internet DNS service [RFC 1034]. Any such A record SHOULD be
! that of the machine generating this path-identity, and any such
! MX record MUST enable the construction of a valid E-mail address
! of the form "usenet@<FQDN>" or "news@<FQDN>" [RFC 2142]. The
! FQDN SHOULD be in all-lowercase form so as to facilitate rapid
! (case senstitive) comparisons.
! 2. A name registered previously in the UUCP maps database (found in
! the newsgroup comp.mail.maps), containing no '.' character.
! 3. An encoding of an IP address - <dotted-quad> [RFC 820] or
! <ipv6-numeric> [RFC 1884]- preferably enclosed between '[' and '
! ]' (the requirement to be able to use an <ipv6-numeric> is the
! reason for including ':' as an allowed character within a path-
! identity).
! [Is there some reason why the [...] was obligatory around an <ipv6-
! numeric> in Brad's syntax, but not around a <dotted-quad>?]
! 4. A '.' followed by an arbitrary name not in the UUCP maps
database, but believed to be unique and registered at least with
all sites immediately downstream from the given site.
! Of the above options, nos. 1 and 2 are much to be preferred, unless
there are strong technical reasons dictating otherwise. In
particular, the injecting agent's path-identity MUST, as a special
! case, be an FQDN with an associated MX record and it MUST be
followed by the special delimiter '%' which serves to separate the
pre-injection and post-injection regions of the Path-content. See
--- 381,418 ----
it SHOULD be one of:
! 1. A fully qualified domain name (FQDN) associated (by the Internet
! DNS service [RFC 1034]) with an A record, which SHOULD identify
! the actual machine prepending this path-identity. Ideally, this
! FQDN should also be "mailable" in the sense that it enables the
! construction of a valid E-mail address of the form
! "usenet@<FQDN>" or "news@<FQDN>" [RFC 2142] whereby the
! administrators of that agent may be reached.
! 2. A fully qualified domain name (FQDN) associated (by the Internet
! DNS service) with an MX record which MUST enable the
! construction of a valid E-mail address of the form
! "usenet@<FQDN>" or "news@<FQDN>" whereby the administrators of
! that agent may be reached.
! 3. A name registered previously in the UUCP maps database (found in
! the newsgroup comp.mail.maps), containing no '.' character.
! 4. An encoding of an IP address - <dotted-quad> [RFC 820] or
! <ipv6-numeric> [RFC 2373] (the requirement to be able to use an
! <ipv6-numeric> is the reason for including ':' as an allowed
! character within a path-identity).
! [Possibility of [...] around the IP address removed at Russ's request.
! Actually the syntax did not permit '[' and ']' in a path-identity, but
! I could easily make it do so, and I doubt any harm would ensue.]
!
! 5. A '.' followed by an arbitrary name not in the UUCP maps
database, but believed to be unique and registered at least with
all sites immediately downstream from the given site.
! Of the above options, nos. 1 to 3 are much to be preferred, unless
there are strong technical reasons dictating otherwise. In
particular, the injecting agent's path-identity MUST, as a special
! case, be an FQDN mailable in the sense defined under option 1, or
! with an associated MX record as in option 2, and it MUST be
followed by the special delimiter '%' which serves to separate the
pre-injection and post-injection regions of the Path-content. See
***************
*** 405,419 ****
When an agent (other than an injecting agent) receives an article,
it MUST establish the identity of the source and compare it with
! the leftmost path-identity of the Path-content. If it matches, a '
! /' should be used as the delimiter when prepending the agent's own
path-identity. If it does not match then the agent should prepend
two entries to the Path-content; firstly the true established
! path-identity of the source followed by an "@" delimiter, and then,
! to the left of that, the agent's own path-identity followed by a '
! /' delimiter as usual. This prepending of two entries MUST NOT be
! done if the provided and established identities match.
[I have upgraded that "MUST" from the previous "SHOULD (and eventually
MUST)". I can see no benefit in not being firm from the start. Of
! course, it will not be implemented on day 1.]
Any method of establishing the identity of the source may be used
--- 422,439 ----
When an agent (other than an injecting agent) receives an article,
it MUST establish the identity of the source and compare it with
! the leftmost path-identity of the Path-content. If it matches, a
! '/' should be used as the delimiter when prepending the agent's own
path-identity. If it does not match then the agent should prepend
two entries to the Path-content; firstly the true established
! path-identity of the source followed by a "?" delimiter, and then,
! to the left of that, the agent's own path-identity followed by a
! '/' delimiter as usual. This prepending of two entries SHOULD NOT
! be done if the provided and established identities match.
[I have upgraded that "MUST" from the previous "SHOULD (and eventually
MUST)". I can see no benefit in not being firm from the start. Of
! course, it will not be implemented on day 1. Contrariwise, I have
! demoted that "SHOULD NOT" from a "MUST NOT" since nothing actually
! breaks if you do it, though it is clearly a lazy behaviour and
! potentially doubles the length of the Path line.]
Any method of establishing the identity of the source may be used
***************
*** 422,431 ****
NOTE: The use of the '%' delimiter marks the position of the
! injecting agent in the chain. In a well-ordered net, there
should therefore be only one `%` delimiter present, and
! injecting agents MAY choose to reject proto-articles with a '
! %' already in them. If, for whatever reason, more than one '%'
! is found, then the path-identity in front of the leftmost '%'
! is to be regarded as the true injecting agent.
5.6.3. The tail-entry
--- 442,451 ----
NOTE: The use of the '%' delimiter marks the position of the
! injecting agent in the chain. In normal circumstances there
should therefore be only one `%` delimiter present, and
! injecting agents MAY choose to reject proto-articles with a
! '%' already in them. If, for whatever reason, more than one
! '%' is found, then the path-identity in front of the leftmost
! '%' is to be regarded as the true injecting agent.
5.6.3. The tail-entry
***************
*** 447,454 ****
the source machine connecting to do the injection, and possibly
other information enabling them to establish the circumstances of
! the injection, provided they do so in a manner that does not match
! any site identifier. The '!' delimiter may be used freely within
! the pre-injection region, although '/' and '@' are also appropriate
! if used correctly.
[If/when we invent some form of Trace or NNTP-Posting-Host header, we
may want to revisit that paragraph.]
--- 467,474 ----
the source machine connecting to do the injection, and possibly
other information enabling them to establish the circumstances of
! the injection (provided it does not conflict with any genuine site
! identifier). The '!' delimiter may be used freely within the pre-
! injection region, although '/' and '?' are also appropriate if used
! correctly.
[If/when we invent some form of Trace or NNTP-Posting-Host header, we
may want to revisit that paragraph.]
***************
*** 465,478 ****
it, or because we have added it ourselves).
! '@' The name immediately to the right is the claimed identity of
the machine from which the article was received, but we were
unable to verify it (and have prepended our own view of where
it came from, and then a '/').
- [Do we want to change '@' to '?'; is there any danger in doing so?]
'%' Everything to the right is the pre-injection region followed by
the tail-entry. The name on the left is the FQDN of the
! injecting agent. The presence of two '%'s in a path indicates
! a double-injection error.
'!' The name immediately to the right is unverified. The presence
--- 485,497 ----
it, or because we have added it ourselves).
! '?' The name immediately to the right is the claimed identity of
the machine from which the article was received, but we were
unable to verify it (and have prepended our own view of where
it came from, and then a '/').
'%' Everything to the right is the pre-injection region followed by
the tail-entry. The name on the left is the FQDN of the
! injecting agent. The presence of two '%'s in a path indicates a
! double-injection (see ...).
'!' The name immediately to the right is unverified. The presence
***************
*** 485,491 ****
Other
Old software may possibly use other delimiters, which should be
! treated as '!'. But note in particular that ':', '-', '_', '['
! and ']' are components of names, not delimiters, and FWS on its
! own MUST NOT be used as the sole delimiter.
NOTE: Old Netnews relaying and injecting programs almost all
--- 504,512 ----
Other
Old software may possibly use other delimiters, which should be
! treated as '!'. But note in particular that ':', '-' and '_'
! are components of names, not delimiters, and FWS on its own
! MUST NOT be used as the sole delimiter.
! [I just removed '[' and ']' from that list, but could be persuaded to
! put them back so long as the syntax gets fixed at the same time.]
NOTE: Old Netnews relaying and injecting programs almost all
***************
*** 499,502 ****
--- 520,525 ----
old style once this new standard has been widely adopted.
+
+
5.6.5. Suggested Verification Methods
***************
*** 530,534 ****
3. If the path-identity fails to match any known alias for the
source (requiring the insertion of an extra path-identity for
! the true source followed by an '@'), simply doing a reverse DNS
(PTR) lookup on the source IP address is not sufficient to
generate the true FQDN. The returned name must be mapped back to
--- 553,557 ----
3. If the path-identity fails to match any known alias for the
source (requiring the insertion of an extra path-identity for
! the true source followed by a '?'), simply doing a reverse DNS
(PTR) lookup on the source IP address is not sufficient to
generate the true FQDN. The returned name must be mapped back to
***************
*** 549,553 ****
Path: foo.isp.example/
! .foo-server/bar.isp.example@[10.123.12.2]/old.site.example!
barbaz/baz.isp.example%dialup123.baz.isp.example!x
--- 572,576 ----
Path: foo.isp.example/
! .foo-server/bar.isp.example?10.123.12.2/old.site.example!
barbaz/baz.isp.example%dialup123.baz.isp.example!x
***************
*** 601,607 ****
Resource Locators (URL)", RFC 1738, December 1994.
- [RFC 1884] Robert M. Hinden and Stephen E. Deering, "IP version 6
- addressing architecture", RFC 1884, December 1995.
-
[RFC 2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions)
Part Three: Message Header Extensions for Non-ASCII Text", RFC
--- 624,627 ----
***************
*** 611,614 ****
--- 631,637 ----
and Functions", RFC 2142, May 1997.
+ [RFC 2373] R. Hinden and S. Deering, "IP Version 6 Addressing
+ Architecture", RFC 2373, July 1998.
+
[RFC 2606] D. Eastlake and A. Panitz, "Reserved Top Level DNS
Names", RFC 2606, June 1999.
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Email: chl@clw.cs.man.ac.uk Web: http://www.cs.man.ac.uk/~chl
Voice/Fax: +44 161 437 4506 Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5