From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Mon Mar 10 2003 - 03:43:06 CST
In <3E6A3C1D.9060402@Sonietta.blilly.com> Bruce Lilly <blilly@erols.com> writes:
>Charles Lindsey wrote:
>All that RFC 2277 requires w.r.t. UTF-8 is that the _protocol_ permit
>users to use that charset (with implementation unspecified). Language
>tagging is a separate issue from charset.
>>>You cannot bury language information inside something
>>>encoded by the charset -- it is then inaccessible to applications
>>>which do not support the charset.
Not if support for that charset is mandatory, and it is the only charset
permitted by the protocol.
>But language information must be available _via the protocol_ (not via
>some non-standard cruft buried inside the text which is encoded by a
>potentially-unsupported charset), whether or not a client chooses to
>support utf-8 or not (ditto for any other charset). And you are
>misinterpreting RFC 2277 w.r.t. utf-8 support.
Language information is supplied however the protocol says it is supplied.
RFC 2277 sets no requirement beyind that it must be supplied somehow.
>> In that case, where is the requirement in RFC 2822
>> to provide language tagging for those headers in US-ASCII? Look at the
>> message I am replying to with its Subject "Re: When will News Article
>> Format be approved?". What language is that written in? How could you
>> inform me, if you wanted, that it is written in "en-US" rather than "en_AU"?
>Subject: =?us-ascii*en-us?q?Re: When will News Article Format be approved?= ?
That is not a feature provided by RFC 2822 (it is provided by RFC 2231,
which is a non-obligatory _extension_ to RFC 2822).
>> Oh Dear! It seems that RFC 2822 does not conform to RFC 2277. What a pity!
>Nonsense. Charles Lindsey is clueless, is what it means.
No, it means that RFC 2822 does not conform to RFC 2277.
>> ... Indeed, to quote from RFC 3066:
>>
>> Language tags may always be presented using the characters A-Z, a-z,
>> 0-9 and HYPHEN-MINUS, which are present in most character sets, so
>> presentation of language tags should not have any character set
>> issues.
>Utter nonsense. That says that *because* the tags are always ldh,
>there is no issue with representing them in any charset.
And is clearly intended to imply that protocols may encode then in any
charset appropriate to that protocol.
>No, you've merely showed your reading comprehension problem once again.
I will leave it to others to judge which of us has problems with reading
comprehension.
>>>>2. Invent a header which conveys the tagging information (for all the
>>>>headers, naturally).
>>
>>
>>>That fails to meet the architectural guidelines in RFC 1958,
>>
>>
>> Would you care to indicate which feature of RFC 1958 you have in mind
>> (just a paragraph number will do)?
>3.12, among others.
3.12 says
3.12 Objects should be self decribing (include type and size), within
reasonable limits. Only type codes and other magic numbers assigned
by the Internet Assigned Numbers Authority (IANA) may be used.
Which feature of that am I alleged to have infringed?
>Irrelevant.
>Irrelevant.
>> If you would care to explain to me how that problem is currently solved
>> in RFC 2822 messages, I will endeavour to answer that question.
>Done (see above).
Not done (see above).
However, the method you suggest COULD be used in our proposed draft,
because RFC 2047 support IS mandatory there.
>>>you need to consider
>>>the totality of what happens with all relevant types of transport, on
>>>widely diverse hardware, with various existing protocols and protocol
>>>options, throughout all stages of generation, transport, and processing.
>>
>>
>> No I don't. Language tagging is an end-to-end affair, as you constantly
>> remind me.
>In what way do transport issues *not* affect the ability of an
>end-to-end transfer?
All the transport is required to do is transport the data correctly
end-to-end. The generation and interpretation of the data is then
performed by the ends as defined for the protocol.
>>>>>Second step is to select a charset that is
>>>>>universally acceptable, ...
>Excluded middle argument. Yawn.
>> But, as you also point out, the opinions of the world's population are
>> irrelevant. All that matters is what the IETF decrees, and the IETF has
>> decreed that UTF-8 is the one that MUST be supported.
>The IETF has certainly NOT decreed that all others are forbidden. Indeed
>GB18030 (and more than a hundred others) are registered charsets.
Now who is excluding middles?
MUST support UTF-8 --/--> all others are forbidden
and also
MUST support UTF-8 --/--> all others are required
>>>>>Third step is to deal with negotiation
>>>>>and fallback support in the various messaging models and protocols.
>>>>
>>>>
>Are you really as clueless as you seem? The negotiation issue is for
>transfer over all possible paths -- it is necessary to detail how
>negotiation will be handled _to get to_ an IMAP server in compatible
>form in the first place. Likewise for SMTP and POP -- and you have
>not addressed those negatiation issues *at all*.
Since there is no proposal on the table that requires 8bit headers to be
sent via SMTP or to arrive at POP servers, there is nothing to be
negotiated. Sorry, your reading comprehension skills are showing through
again.
As to getting news articles TO an IMAP server, that is not covered by our
draft (which merely provides requirements to be observed by transports).
However, if an IMAP server chooses to obtain its feed via NNTP, then it
needs to comply with the NNTP standards.
>>>>>Fourth step is to propose the tagging, charset, and negotiation
>>>>>schemes ...
>More clueless, unsubstantiated ranting. You cannot legally send
>8-bit header content via SMTP without first negotiating for the BDAT
>extension. And that clearly means that gateways and injection agents,
>as well as user agents, are going to have to do so if there is 8-bit
>header content.
I have no desire to send 8-bit header content via SMTP (well, I have in
the longer term, but not for the protocols currently under consideration).
And when 8bit header content IS proposed for SMTP, I doubt it will be via
BDAT.
>> There is no current need for negotiation with NNTP, except for the
>> injector behind the POST command (for which I have made a specific
>> proposal). The Nntpext WG have already taken care of other aspects of
>> UTF-8 within NNTP.
>Suppose the NNTP server's spool is shared with an IMAP server -- then
>there has to be a means of negotiation so that the articles are in
>compatible form.
The internal storage format of servers is explicitly not defined in our
draft. Implementors have to handle whatever the protocols say is legal
(and that will include whatever IMAP extension gets included in the
protocol).
>> There is no need to negotiate a charset if, as proposed, there is only one
>> charset.
>As you keep reminding me, that ignores current non-conforming
>practice (which is likely to continue).
There is no proposal on the table that will make such non-conforming
practice conforming. People who continue to use those charsets will have
to manage as best they can.
>8-bit header content (in the DATA) requires negotiation of BDAT.
Irrelevant.
>> However, in the case of those IMAP servers which choose to implement the
>> 8bit extension (it is agreed that such an IMAP extension will be needed)
>Agreed by whom, in what context, in what timeframe, with what conditions
>and/or restrictions, with what mechanisms?
Agreed by you, seemingly, since as you correctly point out IMAP does not
currently handle 8bit headers. The details need to be worked out, of
course.
-- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5