From: Bruce Lilly (blilly@erols.com)
Date: Fri May 23 2003 - 07:46:53 CDT
Charles Lindsey wrote:
> Problem solved, I think.
No, because (aside from the fact that existing UAs and gateways
don't substitute charset names) the language-tag can be arbitrarily
long, and x- language tags are explicitly permitted. So a valid
(length <= 75 octets) encoded-word might arise in some circumstances
which cannot be split and which must appear on a separate line with
length <= 76 octets (i.e. the encoded-word plus initial WS that
indicates a continuation line). It is clear from RFC 2047's requirements
(for which rationale is given in 2047) that a field with
such an encoded-word be handled in that way. We don't have the
authority to change RFC 2047. Our only option to avoid the MUST
vs. MUST NOT conflict is by reducing the MUST have-non-whitespace-
content-on-first-field-line to a SHOULD or less.
Please remember that the issue isn't whether some hypothetical
software that isn't installed anywhere *might* be able to do
some finagling to work around a particular instance of the issue
(possibly introducing a related problem elsewhere), nor is it
about how often the situation might arise. It is the fact that it
may arise (and therefore according to Murphy's Law it will arise),
and what a conforming implementation might do. Right now an
implementation which tries to be conforming is between a rock and
a hard place; a field with a long encoded-word (where "long" depends
on the length of the field name and length of any initial whitespace
in the field body) as the first non-whitespace part of the field body
(which can occur in address fields, Keywords fields, all unstructured
fields, and others) MUST put the encoded-word on a continuation line
(per 2047) yet simultaneously MUST NOT do so per the current draft
wording.
Note also that even with a short charset name and a short language-tag
(or no language-tag at all), the situation can still arise with either
a long field name or long encoded-text. While it might be possible to
split encoded-text among multiple encoded-words (possibly requiring
decoding of B encoding and reencoding at least part of the text in Q
encoding), no existing gateway that I'm aware of does that.
So if the MUST remains, then we effectively declare all existing
mail-to-news gateways to be non-conforming, require of them rather
complex and extensive processing which may not resolve the conflict
anyway, without giving any viable alternatives short of message
rejection. Extensive modification of header field content, aside
from being inadvisable for other reasons, is likely to be botched in
multiple implementations.
And no matter what is done by a hypothetical rewrite or new gateway
implementation, there is still the issue of UAs, both news-specific
ones and those common to news and mail applications.
I hope that puts the issue in perspective.