From: Bruce Lilly (blilly@erols.com)
Date: Thu Sep 12 2002 - 11:52:28 CDT
Ralph Babel wrote:
> Considering the basic similarities between mail and news,
> and considering that mail is used to submit - typically
> unencapsulated - netnews articles to moderators and also
> to forward/gateway articles, and also keeping in mind that
> - according to our beloved editor - mail and news share a
> common message ID space ...
>
> Does it make sense to have incompatible syntax
> for header fields common to mail and news?
[once again, the issue isn't quite as clear-cut as the canned
selections indicate...]
> 1. No, not at all. Headers common to mail and news should be
> 100% compatible to ease the transition between the two,
> and even news-only headers should be 7-bit and use
> standard MIME encodings.
Clearly, any message that travels via email MUST comply
with the relevant email RFCs and protocols, and that includes
the issue of permissible characters and character sequences
in headers and body. Equally clearly, current practice and
the current draft *require* transmission via email in a
number of circumstances (e.g. post-and-mail, gateways,
submission to moderators). Common headers should be as
compatible as practical, with the goal being a transition
to 100% compatibility (e.g. the space-after-colon requirement
should be phased out). News-specific headers which may be
transmitted via email need to be addressed in some manner;
potential methods include:
a) prohibit some headers from being sent via email; user
agents implementing post-and-mail, outgoing gateways,
and injection agents forwarding to moderators would
be required to implement the prohibition. Clearly,
this method cannot be used for some news-specific headers,
e.g. Newsgroups. This method also does not address some
compatibility issues not directly related to email (e.g.
reading/posting news via Cyrus imapd).
b) encode/decode for email transmission. This requires a
suitable encoding scheme (the MIME mechanisms are not
entirely suitable for newsgroup names). It also requires
that the encoding be implemented in user agents, gateways,
and injection agents. It does not address the
incompatibility with some existing news software. The
encode/decode procedures (which must be implemented in
user agents, gateways, and injection agents) will be
complex as they will need to be able to handle Unicode
normalization rules. [it is not entirely coincidental
that this method bears some slight resemblance to the
current draft (though the draft doesn't (yet) address
User-Agent "value" and injector-Info parameters)]
c) encode/decode for human I/O in the user agent. Content in
headers in a format compatible with email (MIME
encoding where applicable, draft sect. 5.5.2 encoding for
newsgroup names). There are no encode/decode requirements
for gateways or injection agents. There is no incompatibility
with existing news software.
d) prohibit extended (i.e. those including characters other
than unaccented lower-case Latin alphabetic characters and
digits) newsgroup names and use MIME encoding elsewhere.
User agents will need to implement MIME encoding and decoding,
but no new, additional encoding scheme. There are no
encode/decode requirements for gateways or injection agents
and there is no incompatibility with existing news software.
Some users will be unhappy because they will not be able to
have extended newsgroup names.
Method a is not a general solution, as noted. Method b is not
practical; it imposes complexity (e.g. Unicode normalization)
in too many places (gateways and injection agents) and it fails
to address some issues (e.g. Cyrus compatibility). Method c
has a chance of working; the complexity of the encode/decode
procedure is inherent in dealing with Unicode, but it only needs
to be implemented in user agents (and is less of a burden on
user agents than b or the current draft), and all of the issues
are addressed. Method d would work technically, but probably
not from a politico-social perspective.
The exception to the canned statement is that (w.r.t. user
I/O) the newsgroup names are not MIME encoded -- but that's
because MIME encoding isn't readily applicable.
> 2. Same as #1, except that for news-only headers,
> in particular those that are critical to articles
> in transit ("Path:", "Message-ID:", "Newsgroups:"), we
> may well devise our own mail-compatible 7-bit encodings
> for efficiency or to avoid problems with current server
> implementations, or if relaxing the traditional syntax
> would adversely affect the distribution of articles
> within the current server infrastructure.
Method c above achieves this (is there an encoding issue for
Message-ID?). But the lack of 100% compatibility with email
is a problem as email use is required.
> 3. Same as #1, but no restrictions on news-only headers,
> e.g. if someone includes an 8-bit news header in mail,
> it's their own fault. After all, there's
> application/news-transmission.
First, the problem caused by 8-bit cruft in headers
affects not only the perpetrator, but others (recipients,
including moderators, system administrators, etc.) as well.
Second, until such time as the draft becomes an official RFC,
there *isn't* application/news-transmission. Even if/when
that becomes standardized, full implementation will take time.
> 4. Yes, absolutely! Mail is Mail and News is News, and never
> the twain shall meet. After all, this is our standard,
> and we can do with it whatever our editor wants to.
> Besides, the mail folks deserve all the problems
> they can get for having ignored UTF-8.
<satire mode=biting>
Sure -- just remove post-and-mail and gateways, and force
moderators to run ftp servers for submissions to moderated
groups instead of using email. Nobody really cares about
gateways and only a tiny, insignificant number of people
use email anyway, Removing From:, Sender:, Mail-Copies-To:,
Complaints-To:, Posted-and-Mailed, and Reply-To: headers
and removing email addresses from Approved:, mailbox from
the sender-value parameter value of Injector-Info:, and
"poster" from Followup-To: will reduce the size of the
draft considerably. And as the 2nd 'M' in MIME stands for
Mail, we can ignore MIME and just do whatever we please
(ooh, even more reduction in the size of the draft). Even
more verbiage can be dropped by removing any pretension of
using the 822/2822 text message format from the draft's text.
And of course, everybody will stop using email overnight as
soon as the draft is approved. The RIAA and Hollywood movie
studios won't mind one whit if the ftp servers ostensibly
set up for moderated submissions are subverted by hackers
as repositories for pirated copies of copyrighted materials --
they wouldn't dream of taking action against the innocent
moderators. And even if that should happen, the moderators
won't mind; after all it's their duty to be moderators and
they all have far too much free time on their hands anyway.
Even better, with absolutely no links to email, and no email
addresses in news article headers, spam will completely
disappear from Usenet and from the mailboxes of posters,
instantaneously and permanently. We can even liberalize
the headers, e.g. we can allow any sort of strings and any
number of digits in Date headers, so
Date: Mardi, 13 Septembre 2002 1:000000:2 Esperanto Superficial Time
would be a legitimate header; that would make life so much
easier for developers who can't be bothered to read all
that dry RFC stuff (even if they could understand it), in
spite of the fact that it has been in place for decades and
that there are freely available open-source RFC-compliant
implementations. We can even further liberalize newsgroup
names, so that long-suffering speakers of such widely-used
languages as Acronymese can have suitable UTF-8 newsgroup
names which they can read if they peek inside a file or
sniff NNTP packets rather than using a news reader; group
names such as comp.sys.IBM (N.B.'I', 'B', and 'M' are
Unicode characters for upper-case Latin I, etc. and are
completely unrelated to the forbidden upper-case US-ASCII
characters (everybody knows that merely saying that they're
unrelated makes it so)) "comp.sys.ibm" would be an
unacceptable substitute because dammit, IBM is an acronym
and must always be capitalized.. And the Acronymese are
as entitled to *their* upper-case Unicode characters as
the Scandinavians are to their upper-case a-ring. These
pressing issues of human dignity, semantics, and political
correctness are the ones we should be addressing in an
Internet Draft, not that boring technical crap that
everybody ignores anyway.
or
Of course the IETF will be grateful for having the
opportunity to fix the Usefor WG's problems and will no
doubt profusely thank the WG for having worked so diligently
over the course of so long a time to generate that
opportunity.
</satire>