From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Fri Feb 22 2002 - 12:10:35 CST
In <3xIcKzJT34Z8QA9I@pillar.turnpike.com> Paul Overell <paulo@turnpike.com> writes:
>Since all {USENET}-content starts with [CFWS], and we only require a
>single space after the colon, this should be
> {USENET}-header =3D {USENET}-name ":" SP {USENET}-content
> *( [CFWS] ";" ( {USENET}-parameter /
> other-parameter ) )
>(This just removes a trivial parsing ambiguity, it is not a material
>change to the syntax).
Well that looked simple enough, but on inspection I found that some
contents did not start with CFWS :-( .
So there followed a long trawl to fix that bug in umpteen places. I have
now established the following invariant:
NOTE: It may be observed that every {USENET]-content begins and
ends with an optional CFWS (or FWS in the case of the
Newsgroups-, Distribution-, Path- and Followup-To-headers).
Moreover, every {USENET}- or other-parameter also begins and
ends with an optional CFWS.
The complete Collected Syntax as fixed is reproduced below. Will Paul
Overell please check is carefully to see that this problem is truly fixed?
That still leaves the question of the use of templates such as {USENET}. I
have been looking into this, and my present inclination is to change
things as you requested. I am able to do this by using the little known
"Incremental Alternatives" feature in RFC 2234, which allows you to say
header =/ Foo-header
in order to add an extra alternative to the already-defined header rule.
This will actually make one or two other things simpler.
But I have not done it yet, because I prefer to make such huge changes to
the Syntax one at a time. So let's get this CFWS business corrected first.
Appendix B - Collected Syntax
Appendix B.1 - Characters, Atoms and Folding
In the following syntactic rules, nunbers in the left hand margin
indicate rules taken from other documents, specifically:
2 from with the exception of those elements described therein as
"obsolete";
4 from;
5 from.
Where the number is followed by an asterisk ('*'), it indicates that
the rule in question has been modified for the purposes of this
standard.
4 ALPHA = %x41-5A / ; A-Z
%x61-7A ; a-z
2 CFWS = *([FWS] comment) (([FWS] comment) / FWS )
4 CR = %x0D ; carriage return
4 CRLF = CR LF
4 DIGIT = %x30-39 ; 0-9
4 DQUOTE = %d34 ; quote mark
2 FWS = ([*WSP CRLF] 1*WSP); Folding whitespace
4 HTAB = %x09 ; horizontal tab
4 LF = %x0A ; line feed
2 NO-WS-CTL = %d1-8 / ; US-ASCII control characters
%d11 / ; which do not include the
%d12 / ; carriage return, line feed,
%d14-31 / ; and whitespace characters
%d127
4 SP = %x20 ; space
4 WSP = SP / HTAB ; Whitespace characters
UTF8-xtra-2-head = %xC2-DF
UTF8-xtra-3-head = %xE0 %xA0-BF / %xE1-EC %x80-BF /
%xED %x80-9F / %xEE-EF %x80-BF
UTF8-xtra-4-head = %xF0 %x90-BF / %xF1-F7 %x80-BF
UTF8-xtra-5-head = %xF8 %x88-BF / %xF9-FB %x80-BF
UTF8-xtra-6-head = %xFC %x84-BF / %xFD %x80-BF
UTF8-xtra-char = UTF8-xtra-2-head 1( UTF8-xtra-tail ) /
UTF8-xtra-3-head 1( UTF8-xtra-tail ) /
UTF8-xtra-4-head 2( UTF8-xtra-tail ) /
UTF8-xtra-5-head 3( UTF8-xtra-tail ) /
UTF8-xtra-6-head 4( UTF8-xtra-tail )
UTF8-xtra-tail = %x80-BF
2 atext = ALPHA / DIGIT /
"!" / "#" / ; Any character except
"$" / "%" / ; controls, SP, and specials.
"&" / "'" / ; Used for atoms
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
2 atom = [CFWS] 1*atext [CFWS]
2 ccontent = ctext / quoted-pair / comment
2 comment = "(" *([FWS] ccontent) [FWS] ")"
2* ctext = NO-WS-CTL / ; all of <text> except
%d33-39 / ; SP, HTAB, "(", ")"
%d42-91 / ; and "\"
%d93-126 /
UTF8-xtra-char
2 dcontent = dtext / quoted-pair
2 dot-atom = [CFWS] dot-atom-text [CFWS]
2 dot-atom-text = 1*atext *( "." 1*atext )
2 dtext = NO-WS-CTL / ; Non white space controls
%d33-90 / ; The rest of the US-ASCII
%d94-126 ; characters not including
; "[", "]", or "
2 phrase = 1*word
2 qcontent = qtext / quoted-pair
2* qtext = NO-WS-CTL / ; all of <text> except
%d33 / ; SP, HTAB, "\" and DQUOTE
%d35-91 /
%d93-126 /
UTF8-xtra-char
2 quoted-pair = "\" text
2 quoted-string = [CFWS] DQUOTE
*( [FWS] qcontent ) [FWS]
DQUOTE [CFWS]
2 specials = "(" / ")" / ; Special characters used in
"<" / ">" / ; other parts of the syntax
"[" / "]" /
":" / ";" /
"@" / "\" /
"," / "." /
DQUOTE
strict-qcontent = strict-qtext / strict-quoted-pair
strict-quoted-pair = "\" strict-text
strict-quoted-string
= [CFWS] DQUOTE
*( [FWS] strict-qcontent ) [FWS]
DQUOTE [CFWS]
strict-qtext = NO-WS-CTL / ; qtext restricted to
%d33 / ; US-ASCII
%d35-91 /
%d93-126
strict-text = %d1-9 / ; text restricted to
%d11-12 / ; US-ASCII
%d14-127
2* text = %d1-9 / ; all UTF-8 characters except
%d11-12 / ; US-ASCII NUL, CR and LF
%d14-127 /
<EOF> UTF8-xtra-char
5 tspecials = "(" / ")" / "<" / ">" / "@" /
"," / ";" / ":" / "\" / DQUOTE /
"/" / "[" / "]" / "?" / "="
2* utext = NO-WS-CTL / ; Non white space controls
%d33-126 / ; The rest of US-ASCII
UTF8-xtra-char
2 word = atom / quoted-string
Appendix B.2 - Basic Forms
{USENET}-header = {USENET}-name ":" SP {USENET}-content
*( ";" ( {USENET}-parameter /
other-parameter ) )
2 addr-spec = local-part "@" domain
2 address = mailbox / group
2 address-list = address *( "," address )
2 angle-addr = [CFWS] "<" addr-spec ">" [CFWS]
article = 1*( header CRLF ) separator body
5* attribute = {USENET}-token / iana-token / x-token
body = *( *998text CRLF )
2 display-name = phrase
2 date = day month year
2 date-time = [ day-of-week "," ] date FWS time [CFWS]
2 day = [FWS] 1*2DIGIT
2 day-name = "Mon" / "Tue" / "Wed" / "Thu" /
"Fri" / "Sat" / "Sun"
2 day-of-week = [FWS] day-name
2 domain = dot-atom / domain-literal
2 domain-literal = [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]
2 group = display-name ":" [ mailbox-list / CFWS ] ";"
[CFWS]
header = {USENET}-header / other-header
header-name = 1*name-character *( "-" 1*name-character )
2 hour = 2DIGIT
5* iana-token = <A token defined in an experimental
or standards-track RFC and registered with
with IANA>
2* local-part = dot-atom / strict-quoted-string
2 mailbox = name-addr / addr-spec
2 mailbox-list = mailbox *( "," mailbox )
2 minute = 2DIGIT
2 month = FWS month-name FWS
2 month-name = "Jan" / "Feb" / "Mar" / "Apr" /
"May" / "Jun" / "Jul" / "Aug" /
"Sep" / "Oct" / "Nov" / "Dec"
2 name-addr = [display-name] angle-addr
name-character = ALPHA / DIGIT
other-header = header-name ":" 1*SP other-content
other-content
= <the content of a header defined by some
other standard>
other-parameter
= attribute "=" value
2 second = 2DIGIT
separator = CRLF
2 time = time-of-day FWS zone
2 time-of-day = hour ":" minute [ ":" second ]
5* token = [CFWS] token-core [CFWS]
5* token-core = 1*<any (US-ASCII) CHAR except SP, CTLs,
or tspecials>
5 value = token / quoted-string
5* x-token = [CFWS] "x-" token-core [CFWS]
2 year = 4*DIGIT
2* zone = (( "+" / "-" ) 4DIGIT) / "UT" / "GMT"
Appendix B.3 - Headers
Appendix B.3.1 - Template definitions
{CONTROL}-verb = <the verb defined in this standard
(or an extension of it) for a specific
{CONTROL} message>
{CONTROL}-arguments = <the arguments defined in this standard
(or an extension of it) for a specific
{CONTROL} message>
{USENET}-content
= <the content of a header defined in this
standard (or an extension of it) for a
specific {USENET}-header>
{USENET}-name
= <a header-name defined in this standard
(or an extension of it) for a specific
{USENET}-header>
{USENET}-parameter
= <an other-parameter defined in this standard
(or an extension of it) for a specific
{USENET}-header>
{USENET}-token = <a token defined in this standard for
use in conjunction with a specific
{USENET}-parameter>
Appendix B.3.2 - Template instantiations
Approved-content = From-content
Approved-name = "Approved"
Archive-content = [CFWS] ("no" / "yes" ) [CFWS]
Archive-name = "Archive"
Archive-parameter = Filename-token "=" value
Cancel-arguments = CFWS msg-id
Cancel-verb = "cancel"
Checkgroup-arguments = [ chkscope ] [ chksernr ]
Checkgroup-verb = "checkgroups"
Complaints-To-content= address-list
Complaints-To-name = "Complaints-To"
Control-content = [CFWS] {CONTROL}-verb {CONTROL}-arguments [CFWS]
Control-name = "Control"
Date-content = date-time
Date-name = "Date"
Distribution-content = distribution *( dist-delim distribution )
Distribution-name = "Distribution"
Expires-content = date-time
Expires-name = "Expires"
Filename-token = [CFWS] "filename" [CFWS]
Followup-To-content = Newsgroups-content / [FWS] "poster" [FWS]
Followup-To-name = "Followup-To"
From-content = mailbox-list
From-name = "From"
Ihave-arguments = *( msg-id SP ) relayer-name
Ihave-verb = "ihave"
Injector-Info-content= [CFWS] path-identity [CFWS]
Injector-Info-name = "Injector-Info"
Injector-Info-parameter
= posting-host-parameter /
posting-account-parameter /
posting-sender-parameter /
posting-logging-parameter /
posting-date-parameter
Keywords-content = phrase *( "," phrase )
Keywords-name = "Keywords"
Lines-content = [CFWS] 1*DIGIT [CFWS]
Lines-name = "Lines"
Mail-Copies-To-content
= copy-addr / [CFWS] ( "nobody" / "poster" ) [CFWS]
Mail-Copies-To-name = "Mail-Copies-To"
Message-ID-content = msg-id
Message-ID-name = "Message-ID"
Mvgroup-arguments = CFWS newsgroup-name CFWS newsgroup-name
[ CFWS newgroup-flag ]
Mvgroup-verb = "mvgroup"
Newgroup-verb = "newgroup"
Newgroup-arguments = CFWS newsgroup-name [ CFWS newgroup-flag ]
Newsgroups-content = [FWS] newsgroup-name
*( [FWS] ng-delim [FWS] newsgroup-name )
[FWS]
Newsgroups-name = "Newsgroups"
Organization-content
= 1*( [FWS] utext )
Organization-name = "Organization"
Path-content = [FWS] *( path-identity [FWS] path-delimiter [FWS] )
tail-entry [FWS]
Path-name = "Path"
Posted-And-Mailed-content
= [CFWS] ( "yes" / "no" ) [CFWS]
Posted-And-Mailed-name
= "Posted-And-Mailed"
Posting-Account-token= "posting-account"
Posting-Date-token = "posting-date"
Posting-Host-token = "posting-host"
Posting-Logging-token= "logging-data"
Posting-Sender-token = "sender"
References-content = msg-id *( CFWS msg-id )
References-name = "References"
Reply-To-content = address-list
Reply-To-name = "Reply-To"
Rmgroup-arguments = CFWS newsgroup-name
Rmgroup-verb = "rmgroup"
Sender-content = mailbox
Sender-name = "Sender"
Sendme-arguments = Ihave-arguments
Sendme-verb = "sendme"
Subject-content = [ [FWS] back-reference ] pure-subject
Subject-name = "Subject"
Summary-content = 1*( [FWS] utext )
Summary-name = "Summary"
Supersedes-content = msg-id
Supersedes-name = "Supersedes"
User-Agent-content = product-token *( CFWS product-token )
User-Agent-name = "User-Agent"
Xref-content = [CFWS] server-name 1*( CFWS location ) [CFWS]
Xref-name = "Xref"
Appendix B.3.3 - Other header rules
arguments = *( CFWS value )
article-locator = 1*( %x21-7E ) ; US-ASCII printable characters
article-size = 1*DIGIT
back-reference = %x52.65.3A.20
; which is a case-sensitive "Re: "
batch = 1*( batch-header article )
batch-header = "#!" SP rnews SP article-size CRLF
checkgroups-body = *( valid-group CRLF )
chkscope = 1*( CFWS ["!"] newsgroup-name )
chksernr = CFWS "#" 1*DIGIT
combiner-ASCII = DIGIT / ALPHA / "+" / "-" / "_"
combiner-base = combiner-ASCII / combiner-extended
combiner-extended = <any character with a Unicode code value of
0080 or greater and a combining class of 0,
but excluding any character in Unicode
categories Cc, Cf, Cs, Zs, Zl, and Zp>
combiner-mark = <any character with a Unicode code value of
0080 or greater and a combining class other
than 0>
component = 1*component-glyph
component-glyph = combiner-base *combiner-mark
copy-addr = address-list
date-value = 1*DIGIT [ ":" date-time ]
dist-delim = ","
distribution = positive-distribution /
negative-distribution
distribution-name = ALPHA 1*distribution-rest
distribution-rest = ALPHA / "+" / "-" / "_"
groupinfo-body = [ newsgroups-tag CRLF ]
newsgroups-line CRLF
host-value = dot-atom /
[ dot-atom ":" ]
( dotted-quad / ; see
ipv6-numeric ) ; see
2 id-left = dot-atom-text / no-fold-quote
2 id-right = dot-atom-text / no-fold-literal
ihave-body = *( msg-id CRLF )
location = newsgroup-name ":" article-locator
moderation-flag = %x28.4D.6F.64.65.72.61.74.65.64.29
; case sensitive "(Moderated)"
2 msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]
negative-distribution
= [FWS] "!" distribution-name [FWS]
newgroup-flag = "moderated"
newsgroup-description
= 1*( [WSP] utext)
newsgroup-name = component *( "." component )
newsgroups-line = newsgroup-name
[ 1*HTAB newsgroup-description ]
[ 1*WSP moderation-flag ]
newsgroups-tag = %x46.6F.72 SP %x79.6F.75.72 SP
%x6E.65.77.73.67.72.6F.75.70.73 SP
%x66.69.6C.65.3A
; case sensitive
; "For your newsgroups file:"
ng-delim = ","
2* no-fold-literal = DQUOTE *( dtext / strict-quoted-pair ) DQUOTE
2* no-fold-quote = "[" *( strict-qtext / strict-quoted-pair ) "]"
path-delimiter = "/" / "?" / "%" / "," / "!"
path-identity = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
positive-distribution
= [FWS] distribution-name [FWS]
posting-account-parameter
= [CFWS] Posting-Account-token" [CFWS] "=" value
posting-date-parameter
= [CFWS] Posting-Date-token [CFWS] "=" [CFWS]
( date-value /
DQUOTE date-value DQUOTE ) [CFWS]
posting-host-parameter
= [CFWS] Posting-Host-token [CFWS] "=" [CFWS]
( host-value /
DQUOTE host-value DQUOTE ) [CFWS]
posting-logging-parameter
= [CFWS] Posting-Logging-token [CFWS] "=" value
posting-sender-parameter
= [CFWS] Posting-Sender-token [CFWS] "=" [CFWS]
( sender-value /
DQUOTE sender-value DQUOTE ) [CFWS]
product-token = value [ "/" product-version ]
product-version = value
pure-subject = 1*( [FWS] utext )
relayer-name = path-identity
rnews = %x72.6E.65.77.73 ; case sensitive "rnews"
sender-value = ( mailbox / "verified" )
sendme-body = ihave-body
server-name = path-identity
tail-entry = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
valid-group = newsgroups-line
verb = token
-- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5