Re: draft-ietf-usefor-article-06 and last call

New Message Reply About this list Date view Thread view Subject view Author view

From: Erland Sommarskog (sommar-usefor@algonet.se)
Date: Sat Jan 12 2002 - 14:49:09 CST


=?ISO-8859-1?Q?Claus_F=E4rber?= (list-ietf-wg-apps-usefor@faerber.muc.de) writes:
> The question is, what is better:
>
> . 90% can't post to the group at all (most of the time), 30% can't
> even read it but 10% see the name correctly.
> (NB: The numbers < 100% are arbitrary guesses.)
>
> . 100% can read rthe group and post to it but don't see the name
> correctly.

You are here assuming that the amount of people who know about the group
is the same in both cases. This is a very dubious assumption.

A more relevant question is: for how many persons will be the newsgroup
be useful?

And you should also consider: if we go for a non-mainstream solution,
will this give cause to legacy problems later on?

>Languages based on Latin scripts tend to be writeable with the
>basic Latin (ASCII) alphabet. Swedish is an exception AFAIR.

Au contraire. Most languages that use the Latin alphabet use a
different repetoire from English. For some languages the usage of
character beyond ASCII is marginal enough that they can do decently
well about it. English, Dutch, Italian and possibly Catalan qualify
here. German is a weird case, because you have a fallback which is
approved far beyond the field of computing.

But for all other languages that I can think of, being constrained
to the 26 graphemes of the English alphabet also decreases the
quality of the language. This applies to the Scandinavian languages,
to Lappish, Finnish and Estonian, to the Baltic languages, to the
Slavic and Turkic languages that are written in Latin script, to
Hungarian, to Roumanian, to Albanian, to Basque, to Spanish and
Portuguese, most likely also to French, and we should not forget
Vietnamese.

> > What happens to the string "räksmörgås" in the scheme you are proposing?
>
> It depends how you define the scheme. But yes, for reasonably
> effiecient schemes, it would be completly unreadable; something
> like se.test.+sdvjn23eor, WHICH IS STILL BETTER THAN SEEING A NAME
> LIKE se.test.r||ksm||rg||s AND NOT BEING ABLE TO USE THE GROUP.

No, it is not better, because it is completely useless. People to
look around at Google will find se.test.räksmörgås and to some
this will be gibberish, but some will realize "Hey, this is a group
about shrimp sandwiches, my favourite!". But not a single one will
understand se.test.+sdvjn23eor.

And as far as I know se.test.+sdvjn23eor is a legal name today. How
is the newsreader to know that this in fact should be presented as
something else? Yes, we could introduce a new convention and accept
the incompatibility we are introducing. But then some joker creates
alt.two.+two and +two does not decode to anything. Would that name
still be legal?

The same joker can of course generate a group name which includes
an overlong UTF-8 sequence too, but this name would be illegal as it
is illegal today.

> > MIME is a good example of why Yet Another Encoding is a bad
> > thing. MIME does not solve any real problem in this regard, just
> > create new ones.
>
> This is plain wrong. MIME solves the problem of getting 8bit
> characters and binary files through a 7bit mail infrastructure.

Which to a great extent is a non-existent problem. Yes, as Andrew
pointed out there are still mail servers that are not 8-bit clean.
One could wonder whether they had been still around, had mail been
declared 8-bit clean by decree rather than introducing a mega-kludge.

> It also solves the problem of exchanging charset ... information.

Doubtful. In theory, yes. In practice, I'm not so sure. I've seen
more than one Chinese spam which said iso-8859-1.

> > At worst you would have been called Fdrber which is a
> > smaller accident than the almost unreadable crap I get now.
>
> Replace the "d" with any character that happens to have the same
> code point as the "ä" in ISO-8859-1 and you're right.

Acually, "d" is what the "ä" becomes if the high bit is stripped.

> > Some of the problems that some soft software has with UTF-8
> > names are related to MIME: they incorrectly MIME encode the
> > newsgroups name.
>
> You can't blame MIME for implementations that use it where it MUST
> NOT been used.

So does RFC2047 say that it must not be used for Newsgroups?

Had MIME not existed, had newsreader authors never gotten idea of
putting in the encoding, wrong or right.

> Installing a new software can require much effort, especially if
> we're talking about a corporate network with several hunderd
> personal computers. Here Netscape Communicator is still common
> because it is more secure than IE with Outlook Express, and the
> new version is still unstable and different enough to make
> installing it into a network environment a bit more complex. If
> only a few users want to read such a group, it just won't happen.

Yes, but I venture to assume that most people read news from their
home PC:s where they install whatever they want.

> > The report about Xnews did not specify an error, and Xnews users has
> > successfully accessed and posted to the GNKSA group.
>
> Which version? There were reports on this mailing lists, that some
> versions don't work.

Luu Tran, the author of Xnews, puts out new versions about every week.
While his software is a moving target, I am quite confident when I
say that he has not done any changes that could affect the accessibility
for UTF-8 groups. He has several times said that this character set is
something he knows too little about to try to handle, to dismay not
the least for Polish users.

By the way, Xnews users is a category that you can presume to update
frequently.

--
Erland Sommarskog, Stockholm, sommar@algonet.se


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.