[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Amuricans and the eighth bit



I think it is time to lay some of this to rest.

> We are fed up writing `/' for acute and `\' for grave on our Gaelic
> discussion list, so I hope you get 8-bit mail sorted out as soon as
> possible one way or the other.  Increasingly often we find that people
> who are used to 8-bit mail on their own systems send 8-bit messages to
> the list and the accented letters are received stripped, except by the
> few lucky subscribers who are on an 8-bit-clean path.

I'm fed up too. When I don't do software I do mathematics. Like most 
mathematicians, I'm never satisified with the existing array of symbols
available to me. When it comes down to what I can represent in plain vanilla
7 bit ASCII, I'm really at a loss. Mathematics papers written in 7 bit ASCII
suffer just as much, if not more, than a lot of foreign languages (where do
you think the characters mathematicians use come from anyway? Yup, we steal
them from other character sets. Now that Hebrew is just about used up, we'll
be starting on the oriental alphabets soon, I figure. It will take a while to
exhaust those.) Languages at least carry a decent amount of context around so 
you can figure out what was meant. Mathematical notation does not share this 
feature of natural language, unfortunately.

> I am no expert on mail, but it seems to me that the simplest method is
> just to allow the eighth bit through.  The whole world seems to be
> becoming "8-bit-clean" very fast over the last couple of years - witness
> for example TeX 3.0, LaTeX, MS-Kermit 3.0, microEmacs, SunOS, Ultrix, VT320,
> MS Windows, X-terminals.  In many cases this is just a case of removing
> the line of code which strips the eighth bit.

A lot of experts feel that simply declaring things to be eight bit clean is
a recipe for disaster. This has nothing to do with whether or not those
experts want 8 bit characters in their mail. I believe it will be a disaster,
and I want those 8 bits just as much, if not more, than you do. (A large
portion of the local users I support are mathematicians like me.)

> I have been following the discussions on this list for a while, and it
> looks to me as if, multi-part and multi-media mail are a complex matter
> by comparison, not likely to be sorted out for several years, and not likely
> to be implemented on a worldwide scale for at least several years after that.
> They would benefit if the 8-bit path had been cleared before that.

I think you have it backwards. Remember, we're talking about a framework,
not a complete and finished solution. Constructing a framework for multimedia
and multipart mail appears to be a lot easier to do than solving the 8 bit
problem. The multimedia and multipart solution has the advantage that it
can solve the 8 bit problem in such a way that the 8 bit path issue can be
left undecided for the time being.

> Where are the big problems with letting the eighth bit through?  I come
> across lots of software which is not 8-bit-clean, but it doesn't break
> when you hit it with 8-bit text.  It just strips the eighth bit or throws
> out the 8-bit characters and the problem is mine alone.

I'm not going to reiterate the arguments on this point. Suffice it to say that
a large number of people see major problems with this approach. Reread the
list archives for an over-adundance of examples.

> Are people getting confused between the ISO 8859 character sets, which avoid
> the "8-bit control" (C1) characters (hex 80 to 9F), and character sets like
> the IBM-PC character set, which do not.  Surely the whole point of ISO 8859
> (or ISO 10646 for that matter!) leaving the C1 character positions unused,
> which is otherwise very wasteful, is to avoid any possibility of trouble
> being caused by links which strip the eighth bit.  If there is any old
> software which gets broken when it is fed 8-bit characters, isn't it a
> simple matter to "fix" it by adding a single line of code or a filter to
> strip the eighth bit from input, and things are no worse than they were 
> before?

This is a big can of worms you're prying open. First of all, if the C1 
characters get 8-bit stripped down to C0 characters they can cause big trouble.
I don't think anyone can dispute this. This has to be avoided. There are
two solutions which are not mutually exclusive. The first is to use
character sets that don't employ C1 characters, like 10646. This then drags
you into the character set debate, and the Unicode and 10646 people will
be gunning for you.

The second is "the wretched option", which means that you don't send 8 bits
to people that cannot deal with it. This leads to a lot of other
consequences, and has been the subject of a lot of debate on this list.

> Don't Americans want 8-bit characters too?

Well, I'm an American, and I want 'em. See below.

>    * 100 of the 175 subscribers to our Gaelic bulletin board are Americans

Good for them.

>    * Don't you want to use the characters "cent", "half", "quarter",
>      "copyright", "plus-or-minus", "degrees", "mu", "squared",
>      and "cubed" which are already there waiting to be used underneath 
>      the "Compose character" key on your VT220, VT320 and Sun keyboards?
>      - and all in compliance with the ISO 8859-1 standard (a.k.a. Latin-1)
>      Our typists certainly want "half", "quarter", "squared" and "mu"
>      and "pounds".

Yes, I do.

>    * Don't French Canadians want to be able to write French with accents?

I hope so. When I write in my second language, which happens to be French, I
like to use the accents properly.

>    * 1-1 translation between EBCDIC and "ASCII" is not possible unless
>      you extend ASCII to 8 bits.  This is due to a few characters like
>      the "not" sign which are present in EBCDIC but not in ASCII.  
>      They are present in Latin-1.  EBCDIC is 8-bit already.
>      1-1 translation with EBCDIC is difficult anyway because there is 
>      no single authoritative version of EBCDIC, but that is another story.

No argument with this analysis.

This is all very nice, but I for one am coming to resent this characterization
of Americans as not wanting 8 bits, I guess because its only "those damn
furriners" that use them, or some such. This characterization is false for me,
and I suspect it is false for every single American who's on this list.

The opposition to 8 bit solutions is, I believe, not oriented along national
boundaries, as you seem to think it is. If there's an orientation, it is along
design/development versus operational lines. A significant number of the
design/development folks don't see any problems, since they obviously can
be designed into oblivion. This view is correct. 

The operational folks, on the other hand, see big problems because they're the 
ones that have to deal with current practice. They know that there are gateways
and whatnot that will never convert. They deal with them on a daily basis. 
These folks are also correct.

Now, a result of this split is that it does conform somewhat to the American
versus non-American split, since the US probably contains much more of the
the Internet than any other country. The operational problems are going to
be in the US, then, and the operational folks who object to the "declare the
path 8 bit clean" solution are going to be here too. But this has nothing
to do with wanting, or not wanting to be able to use the 8 bit character sets.

Now, I'm a designer/developer, but I also operate a large gateway and do
customer support for thousands of installations of my software, some of them
running really enormous gateways. So I see the advantages of both positions,
and the disadvantages. That's why I support the further development of
822 extensions, since I feel that we can reach closure on them in relatively
short order, and they will end up giving me the 8 bit support I want without
messing me up operationally. The fact that it gives the "designer/developer me"
something to do is just icing on the cake.

So please do not characterize Americans as not wanting 8 bits. This is not
correct for the people on the list at least, and I suspect it is wrong in 
general. In fact, why not stop characerizing Americans, period, since it is
not germane to this discussion anyway.

>   Kevin Donnelly

				Ned Freed