Re: When will News Article Format be approved?

New Message Reply About this list Date view Thread view Subject view Author view

From: J.B. Moreno (planb@newsreaders.com)
Date: Wed Mar 05 2003 - 00:40:32 CST


On 3/5/03 12:29 AM, Bruce Lilly at <blilly@erols.com> wrote:

> J.B. Moreno wrote:
>> On 3/4/03 6:38 PM, Lawrence Greenfield at <leg+@andrew.cmu.edu> wrote:
-snip-
>> 8 bit content exists. We can do one of three things: (1) bless the
>> current usage (i.e. multiple local charsets), (2) forbid it, or (3)
>> bless a particular charset.
>
> Or (4) remain silent, deferring to the relevant referenced
> standards. That what RFC 1036 does, and what the Kohn draft
> does.

Given that the "relevant referenced standards" do 2, that's just an attempt
to shift the blame.

-snip-

>> -- but while there is no certainty, the available evidence
>> indicates that alienates 15% of the users.
>
> RFC 1036 (the current standard) defers the matter to RFC 822,
> which is quite clear that those 15% (or the authors of software
> used by them) have alienated themselves.

See, that's what I don't like -- the absolute contempt in the above
statement: they haven't "alienated themselves", they found themselves being
inadequatly served by the existing standard and had to go beyond it to get
things done.

They had a problem, they solved it, their solution works for them.

It's the fault of the people writing the standards that they had to do so.

-snip-
>> For myself, I think it will alienate them, and feel that is too high a
>> price to pay for compatibility with RFC 2822. I think it's better that
>> we simply accept that 15% are going to be incompatible, and deal with
>> the consequences as best we can.
>
> But J.B. those 15% are still going to be alienated and
> incompatible if utf-8 is blessed. That would legalize the
> 0.006% utf-8 use and break existing compliant interoperability
> with SMTP and IMAP; SMTP is necessary for moderation, which I
> suspect accounts for more than 0.006% of all articles, and IMAP
> is in widespread use.

I gave 3 options;
   1. Bless using multiple 8 bit charsets
   2. Forbid 8 bits
   3. Bless a single 8 bit charset and try to get them to switch.

Option 2 means what we say is irrelevant.

Option 1, while most acceptable to the users, is harder on the programmer,
doesn't work well in the face of multiple charsets, and means that we're
likely to continue to see programs that handle charset X, but not charset Y
(and an eventual split with the asians).

Option 3 means UTF8, has problems with SMTP, specifically sendmail and
moderated non-ascii groups. But I'd rather solve those problems than live
with 1 or be ignored as with 2.

(Unlike with sendmail, IMAP's problem with raw UTF8 seems to be, uhm,
political I guess is the best term. Currently an IMAP server that was
trying to carry a full text-only newsfeed would be dropping or corrupting
15% of the articles. There's been no mention of a technical problem with
treating the syntactically valid UTF8 *as* UTF8, and allowing those articles
to be carried -- the objection has been that it'd be "wrong" to do so
because while dropping 15% of the feed as "non-compliant" is fine, accepting
0.017% of that 15% would be unacceptable because half of that 0.017% would
be mis-identified, and that would be "confusing" to the human readers. I
don't see that we need to bow to those politics).

-- 
J.B. Moreno


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.