Re: When will News Article Format be approved?

New Message Reply About this list Date view Thread view Subject view Author view

From: Bruce Lilly (blilly@erols.com)
Date: Thu Mar 06 2003 - 14:40:46 CST


Charles Lindsey wrote:
> In <3E661491.504@Sonietta.blilly.com> Bruce Lilly <blilly@erols.com> writes:
>
>
>
>>>But I'd rather solve those problems than live
>>>with 1 or be ignored as with 2.
>
>
>>Feel free to try; first step is to come up with an RFC 2277/3066-
>>compliant method for language tagging for text strings in header
>>fields, compatible with existing 822/2822 parsers, that works with
>>8-bit charsets.
>
>
>
> Done that. Two methods proposed in fact.
>
> 1. Use the language tagging in Unicode (and hence in UTF-8). You insist
> that does not satisfy RFC 2277/3066, but it does. Two leading workers in
> the Unicode project (one of whom posts to this list) have cnfirmed that to
> me, and your uncorroborated say-so carries no weight at all. They also
> confirm that lanuage tagging is not particularly useful or necessary in
> such short texts as headers, and that the ability to change language in
> mid-text is even less so. However, that method does suffer the
> disadvantage that it is to be deprecated in Unicode 4.

As has been pointed out, that is not RFC 2277-compliant or
RFC 3066-compliant. RFC 2277 treats language independently of charset
and notes that charsets need not be, and often are not, universally
supported. You cannot bury language information inside something
encoded by the charset -- it is then inaccessible to applications
which do not support the charset. RFC 3066 language tags are a subset
of ASCII, not some oddball bunch of multibyte characters which are
further obscured by encoding. Regardless of whether or not you or
anybody else thinks that language tagging is "useful", 2277 clearly requires
that all protocols MUST provide a means of conveying language information
end-to-end if an originator chooses to include it.

> 2. Invent a header which conveys the tagging information (for all the
> headers, naturally).

That fails to meet the architectural guidelines in RFC 1958, does not
provide for multiple charsets, and does not provide for multiple
languages. And is unlikely to work (suppose, e.g., that some transport
needs to add a trace header field in a different charset and/or language).
Hand-waving isn't coming up with a workable method; you need to consider
the totality of what happens with all relevant types of transport, on
widely diverse hardware, with various existing protocols and protocol
options, throughout all stages of generation, transport, and processing.

Step 1 is not done (strictly speaking it's not a requirement for
names, only for text strings).

>>Second step is to select a charset that is
>>universally acceptable, and we already know that there are
>>objections to utf-8 (by those who prefer GB18030) and to GB18030
>>(by those who prefer utf-8).
>
>
> There is not a cat in hell's chance that GB18030 will be accepted for that
> universal role, and UTF-8 is the only other show in town. There is a real
> risk that the Chinese would shut themselves off in a cooperating subnet,
> and there is room for some discussion as to whether we make it easier or
> harder for them to do so.

If it comes down to a battle of wills between one third of the Earth's
population on one hand and Charles Lindsey on the other, guess who
will lose. If there is no *universally* acceptable charset, the
whole issue is moot. Best to move on with something that *can*
work (or, in the case of protocol elements, forego i18n). Rather than
issue ultimata and attempt to transform a people with a long and proud
history into second class world citizens, you might consider trying
to understand their point of view and work on an acceptable compromise.
That is, if you actually want support for an 8-bit universal charset,
rather than a guarantee that there will never be one...

>>Third step is to deal with negotiation
>>and fallback support in the various messaging models and protocols.
>
>
> Yes, which is why I have proposed that people wishing to post to moderated
> I18N groups should negotiate with their injector first. The fallback is
> not particularly good (it puts the burden on the poster) and the
> consequence is that moderated I18N groups are not likely to happen for
> some time. That seems an acceptable compromise if it lets the non-moderated
> I18N groups get underway.

"Negotiate with their injector" is inadequate. It does not address
transfer via SMTP, POP, or IMAP. It does not handle gateways.

>>Fourth step is to propose the tagging, charset, and negotiation
>>schemes in the appropriate places for adoption by the standards
>>bodies responsible for SMTP, NNTP, IMAP, and the message format.
>
>
> It is unnecessary for SMTP (I am assuming UTF-8 for newsgroups-names only
> at the moment)

As SMTP (RFC 2821) has requirements on the content of the DATA
command, negotiation and fallback is necessary. The BDAT extension
(RFC 3030) provides the transport mechanism. The fallback needs
to be addressed carefully, considering gateways, injection agents,
and user agents. Details need to be specified; e.g.
gateways, injectors, UAs MUST NOT sent 8-bit header content w/o
first negotiating for BDAT transfer with the SMTP server, envelope
addresses need to be set appropriately and DSNs supported (with
negotiation) so as to be able to provide fallback if some later
relay in the store-and-forward chain does not support BDAT or
DSNs, etc. That has not been done.

> though desirable it should happen for other
> reasons. It is unnecessary for NNTP, except to cope with those moderated
> I18N groups. If IMAP wants a tag to warn them that 8-bit headers are
> present, I have no objection to providing one. And I have no problem joing
> discussions for an IMAP extension for use in those IMAP servers which
> choose to offer I18N newsgroups to their clients.

It is not a question of "a tag"; it is a matter of negotiating
and fallback *prior* to sending data. And there are serious
problems here, because IMAP can get content from a news spool
in which case there simply is no opportunity for negotiation;
about the only way to solve that is for news servers to assume
that the spool might be accessed by am IMAP server and therefore
to negotiate with the upstream source via NNTP. In any event,
neither hand-waving nor empty promises will suffice -- there's
real work that needs to be done by the proponents.

>>If you're able to accomplish all of that, then, and only then, can
>>that charset begin to be used, and you still will have to convince
>>users to switch. I don't think there's a chance of doing all of
>>that within a decade, but you can prove me wrong by actually doing
>>it.
>
>
> Most of it is already done, or is doable given the will to make it happen.
> Your reluctance to cooperate in making it happen is noted.

It *might* be feasible -- the proponents haven't even done the
homework to determine that, especially w.r.t. IMAP. Your claim
of "reluctance to cooperate" is out of line; I have gone out of
my way to point out the issues that have to be addressed, where
relevant information may be found, and I have even pointed out
flaws in half-baked schemes that should have been more thoroughly
researched before having been presented. But I'm not one of the
just-send-8 proponents, and I'm under no obligation to do your
work for you. Nor, for that matter, am I a punycode-newsgroup-name
proponent, and I'm under no obligation to do the stringprep, etc.
work that those proponents will need to do to come up with a
*workable* proposal. I'm not even a strong proponent of an
extra-protocol means of achieving newsgroup name i18n, other than
pointing out that it is a viable option.

You have failed to grasp the big picture. That is, that
1. fallback requires that compatible methods be supported. That
    means 2047/2231 for text strings and parameters, and either
    an ACE for newsgroup names or some extra-protocol means of i18n
    (which also requires some work). The compatible methods are in
    no way optional; they must be supported everywhere. Support for
    the hypothetical universally-acceptable 8-bit charset would of
    course be optional, but where supported would add a very real
    cost to implement, not to mention the cost associated with
    the necessary development of extensions to the relevant
    protocols. It may be prudent to consider what return is
    provided after paying those costs.
2. Even if, in the fullness of time, a universally-acceptable
    8-bit charset is found, and all of the necessary mechanisms
    and extensions are specified with all of the i's dotted and
    all of the t's crossed, there will still be a long, slow
    learning curve of implementation. Of course all of the
    compatible methods will continue to work. Given that and
    the costs, what's the incentive for anybody to implement
    the optional hypothetical universal-charset support?
Those two points apply equally to text strings and to names
(protocol elements).

In the case of newsgroup names, given that *if* i18n is to be
applied via the protocol, an ACE of some sort is required, what
is to be gained by the *additional*, _pervasive_ requirement
of support for some hypothetical universal 8-bit charset? And
how many software authors will bother to implement it? [note,
though that in the case of names, language-tagging isn't a
requirement, it's _merely_ a strong recommendation] Ignoring
that strong recommendation and concentrating solely on
protocol elements, the reality is
a. no universal agreement on an 8-bit charset
b. negotiation mechanism already in place for SMTP (your oft-
    repeated disparaging remarks notwithstanding, a great deal
    of work has been done on the relevant protocols), but
    details inadequately specified w.r.t. UAs, injection agents,
    gateways, etc. regarding the news model (and it's going to be
    more complex for other protocols than in the case of mail,
    where per-recipient DSNs can be handled for fallback). No
    work to speak of in place for other protocols; the IMAP
    issues haven't even been investigated adequately, and I
    think it's fair to say that the just-send-8 proponents
    don't even have a grasp of what IMAP is and how it operates.
Rectifying that reality isn't a 5-minute job. Nor 5 hours,
days, or weeks. There's a slim change that with a lot of
dedication and hard work, a large group of just-send-8
proponents could negotiate an acceptable charset and work out
protocol details in 5 months. *Very* slim, with a *large*
group of *dedicated* people. 5 years is a more likely scenario,
and in fact it might never happen (it may be impossible to
reach agreement on a universally-acceptable chearset, and/or
there may be insurmountable technical hurdles).

Beyond that is still the issue of preserving language information
in the protocols during transport, which is *mandatory* for
text strings in header fields (and which is of course a requirement
that is fully met by RFC 2047/2231).

Which leads to the even bigger picture:
3. Given the fact that compatible methods are in any event required,
    and the fact that substantial amounts of work need to be done
    before any implementation of use of some hypothetical,
    yet-to-be-agreed-upon, universally-acceptable 8-bit charset
    can begin, surely you can see that the only way to get an
    approved, implementable RFC 1036 successor into the world
    without even more delay is to forego the raw 8-bit issues for
    the moment and get a document out that addresses the compatible,
    time-tested, widely-implemented methods (which will need to be
    supported for fallback by every implementation even *if* some
    workable raw 8-bit method can be concocted). Which is what a
    number of people in this WG, in the IESG, and other observers
    have been telling you for some time.
If somehow you fail to see that big picture, you're missing something;
look again. We can wait a little while while you look and think.
We can't wait another 5 years, or 7, or infinity before getting a
workable standard published.


New Message Reply About this list Date view Thread view Subject view Author view


This archive was generated by hypermail 2b29.