[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Extending news to EAI



There is now an experimental protocol for UTF-8 headers in Email (RFC5335
and its relations). This was the product of the IMA WG. There has been
recent discussion of applying this to Netnews, and the conclusion seems to
be that the IMA WG is not the place to do this, and that a private draft
would be the way to do this. However, this list would be a reasonable
place to discuss it.

Essentially, under this protocol, UTF-8 may be freely used in Email
headers, but a downgrading mechanism is needed whenever mail passes to a
server that does not advertise the UTF8SMTP capability.

This is much what the USEFOR WG wanted to do in its earlier days, but the
decision was then taken to postpone it until the base documents were
complete, and then to bring it up again as an Experimental protocol. So
maybe now is the time to embark on it.

It is much easier with Netnews than with Email, since the underlying
transport (whether NNTP or UUCP) is already 8-bit clean. I would not
expect it to become the norm on the Big-8 groups for quite some time, but
it would be very useful for National hierarchies, such as the Scandinavian
ones where the inability to have Newsgroup Names with their own special
characters in them is a right pain (apparently).

So the experimental protocol would start off with the extensions allowed
by RFC5535, and then add UTF-8 in the Newsgroups header. It would be up to
individual hierarchies to encourage deployment of the experiment within
their groups.

It has already been established that the existing transport mechanisms
will move such articles around without problem. No downgrading is
envisaged except at gateways to email (at which point the mechanisms
already agreed for EAI/IMA would apply). But existing servers would cope
fairly well without modification, at least until UTF-8 newsgroup-names
were introduced.

Clearly, anyone expecting to read such articles would need a suitable
client. Some clients will already display them (Opera, for example).
otherwise, it would be up to people to install suitable user agents if
they wanted to see these articles properly (existing agents might display
such headers in a garbled form (which might be good enough in languages
which wers based on Latin alphabets). Bodies would still expect to be
covered by a Content-Type: test/plain; charset=utf-8.

With utf-8 newsgroup-names, again people who wanted to subscribe to such
groups would need suitable clients, but existing servers would serve them
once they had been persuaded to store them in their active lists. That
would require control messages that created such groups to be accepted,
and also articles submitted to moderated groups to be forwarded, so in
practice people who wanted to subscribe to such groups would need to
connect to servers which had been upgraded to cope. But the important
point is that articles would still propagate correctly through
non-upgraded servers.

Some early USEFOR drafts show how the Newsgroups header was to be
extended. In particular, it required some very strict normalization, so
that a simple byte-by-byte comparison of newsgroup-names would always
work.

Note that an experimental group dk.test.utf8-æøå (which should show up
in UTF-8 clients properly, althoug this message is somposed in iso-8859-1)
already exists on several servers, notably on news.dotsrc.org, and this
message is crossposted there, and if a thread develops there, then well
and good. But people on that list might do better to subscribe to the
usefor mailing list (see http://www.imc.org/ietf-usefor/index.html).

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131            Web: http://www.cs.man.ac.uk/~chl
Email: chl@xxxxxxxxxxxxxxxx      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5