Re: When will News Article Format be approved?

New Message Reply About this list Date view Thread view Subject view Author view

From: Bruce Lilly (blilly@erols.com)
Date: Wed Mar 05 2003 - 09:15:29 CST


J.B. Moreno wrote:
> On 3/5/03 12:29 AM, Bruce Lilly at <blilly@erols.com> wrote:

>>RFC 1036 (the current standard) defers the matter to RFC 822,
>>which is quite clear that those 15% (or the authors of software
>>used by them) have alienated themselves.
>
>
> See, that's what I don't like -- the absolute contempt in the above
> statement: they haven't "alienated themselves", they found themselves being
> inadequatly served by the existing standard and had to go beyond it to get
> things done.

No comtempt, just the facts. It wasn't the standards thet inadequately
served the needs, it was (and in some cases still is) the lack of
support for standard mechanisms by news UA authors that left users
with no solution.

> They had a problem, they solved it, their solution works for them.

Not quite; they worked around the problem of lack of MIME support
by using untagged charsets -- it may appear to work from the sender's
perspective (after all he is obviously using the same charset as
he composed his message with), but it doesn't scale to something as
diverse as Usenet -- it doesn;t work adequately in general.

> It's the fault of the people writing the standards that they had to do so.

No, it's the fault of the software authors. But fixing the blame
isn't the objective; fixing the problem is. Trying to force
working software on a massive scale to deal with broken messages --
which will remain guess-the-charset, guess-the-language broken --
isn't a solution. No matter what, the currently-broken software
that has led users to the inadequate work-arounds needs to be fixed.
That broken software is the problem, and fixing it fixes the problem.
It's possible to go beyond that and introduce a phased approach to
utf-8, but that alone won't solve the problem, and the first baby
step down that path (RFC 2277/3066 language-tagging) has yet to be
taken.

>>But J.B. those 15% are still going to be alienated and
>>incompatible if utf-8 is blessed. That would legalize the
>>0.006% utf-8 use and break existing compliant interoperability
>>with SMTP and IMAP; SMTP is necessary for moderation, which I
>>suspect accounts for more than 0.006% of all articles, and IMAP
>>is in widespread use.
>
>
> I gave 3 options;
> 1. Bless using multiple 8 bit charsets
> 2. Forbid 8 bits
> 3. Bless a single 8 bit charset and try to get them to switch.
>
> Option 2 means what we say is irrelevant.

No, it means that the software authors need to clean up their broken
software that caused the problem in the first place.

> Option 1, while most acceptable to the users, is harder on the programmer,
> doesn't work well in the face of multiple charsets, and means that we're
> likely to continue to see programs that handle charset X, but not charset Y
> (and an eventual split with the asians).

We can agree that that option isn't viable, though for somewhat
different reasons; it doesn't scale well for users, regardless of
software issues.

> Option 3 means UTF8, has problems with SMTP, specifically sendmail and
> moderated non-ascii groups.

Sendmail isn't the issue; the SMTP protocol and the message format
prohibit raw untagged 8-bit content in the envelope and message header,
as does IMAP. Sendmail is one SMTP implementation -- there are others --
and even if you could wave a magic wand and change them all overnight
(you can't) -- there is still the issue of the protocol specification.

> But I'd rather solve those problems than live
> with 1 or be ignored as with 2.

Feel free to try; first step is to come up with an RFC 2277/3066-
compliant method for language tagging for text strings in header
fields, compatible with existing 822/2822 parsers, that works with
8-bit charsets. Second step is to select a charset that is
universally acceptable, and we already know that there are
objections to utf-8 (by those who prefer GB18030) and to GB18030
(by those who prefer utf-8). Third step is to deal with negotiation
and fallback support in the various messaging models and protocols.
Fourth step is to propose the tagging, charset, and negotiation
schemes in the appropriate places for adoption by the standards
bodies responsible for SMTP, NNTP, IMAP, and the message format.
If you're able to accomplish all of that, then, and only then, can
that charset begin to be used, and you still will have to convince
users to switch. I don't think there's a chance of doing all of
that within a decade, but you can prove me wrong by actually doing
it.

In the meantime, it's clear that the only viable way to get a 1036
successor standard in place soon is to do so in a manner which is
compatible with 1036, its underlying message format, and the
interoperating protocols, and that means going forward with the Kohn
draft or something very much like it.

> (Unlike with sendmail, IMAP's problem with raw UTF8 seems to be, uhm,
> political I guess is the best term.

No, it's a technical issue; the charset needs to be tagged because
the charset tag is transmitted as part of the protocol -- see RFC 2060.

> while dropping 15% of the feed as "non-compliant" is fine, accepting
> 0.017% of that 15% would be unacceptable because half of that 0.017% would
> be mis-identified,

A 0.006% solution is for all practical purposes the same as no solution
at all. That's not a political issue; it's an engineering fact.


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.