From: Bruce Lilly (blilly@erols.com)
Date: Wed Feb 12 2003 - 09:44:38 CST
Sam Roberts wrote:
> Quoteing blilly@erols.com, on Tue, Feb 11, 2003 at 09:09:16AM -0500:
>
>>Sam Roberts wrote:
>>
>>>Quoteing blilly@erols.com, on Mon, Feb 10, 2003 at 05:39:26PM -0500:
>>>
>>>>The draft permits a UA to generate raw utf-8. That is then passed to
>>>>an injection agent, which determines that one or more newsgroups are
>>>>moderated. Existing injection agents do not transform raw-utf-8,
>>>>and no existing or future injection agent can transform any untagged
>>>>8-bit content without charset and language information.
>>>
>>>
>>>Why not?
>>>
>>>The charset seems clearly to be utf-8!
>>
>>No, in fact Usenet (and mail) abounds with a large variety
>>of untagged 8-bit charsets.
>
>
> Sorry, none of that is validly encoded, and has specifically NO meaning,
> unless assigned one.
Incorrectly tagging something other than utf-8 *as* utf-8
makes it worse; at least the untagged cruft is clearly
illegal -- incorrectly labelling it doesn't fix it, it
only compunds the error.
> Backwards compatibility with standards compliant messages, that I
> understand, but backwards compatiblity with invalidly encoded messages?
Before utf-8 can be adopted, there needs to be a transition
period where there is a moratorium on *all* untagged 8-bit
header field content as a prerequisite to a state where
the only untagged 8-bit content is utf-8. The current
Usefor draft lacks such a transition plan.
>>>And a langugae tag is only allowed for paramaters, and even there is
>>>optional, is it not?
>>
>>No, language-tagging is provided by MIME for RFC 2047
>>encoded-words also.
>
>
> And is still optional, so does nothing to explain why you would make the
> statement "no existing or future injection agent can transform any
> untagged 8-bit content without charset and language information".
Incorrect; it is "optional" only in the sense that the *user*
need not specify a language. If the user wishes to specify a
language, the protocols MUST provide for preservation of that
language information. See RFC 2277 section 4.
> Apparently there are problems, but lack of charset information and
> language tagging doesn't seem to be.
There indeed are problems, and the lack of tagging are among them.
Lack of charset tagging could possibly be overcome by a suitable
transition plan, but language tagging provision in the protocol
which carries end-to-end is an absolute requirement which can
only be currently met in a compatible manner via RFC 2047/2231
methods.