[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unicode...



In case people didn't catch this one last week in the NY Times business section,
see the attached.  Unicode is a pretty serious contender to replace ASCII 'round
the world...

	Rick McGowan@NeXT.COM

=== attachment ===
		Universal Computer Code Due
Rivals Join Forces To Design Standard For All Languages
--------
New York Times, Wednesday, February 20, 1991, page C1
By Andrew Pollack

San Francisco, Feb. 19 -- A group of leading computer companies today
announced an ambitious effort to develop a lingua franca for the electronics
age, a universal digital code that could be used by computers to represent
letters and characters in all the world's languages.

A consortium has been formed to develop and promote the new code, known as
Unicode.  Its 12 members include many top computer companies that are often
fierce rivals, like I.B.M., Apple Computer, Microsoft, Sun Microsystems and
Xerox.

Easier Communication

If the code becomes a worldwide standard, it would be easier for people in
different countries to communicate by electronic mail.  The code would also
make it easier for software companies to develop programs that can work in
different languages.

Right now, for instance, an American computer often cannot understand the codes
used by a French computer to represent accented characters, so a message sent
electronically from France to the United States might arrive without the
accents or with mistaken characters.

But with the new code, any computer anywhere could understand and display
everything from French accents to Chinese ideographs, not to mention letters in
Bengali, Hebrew, Arabic and other languages.

Computers represent all information as a series of zeros and ones, or as
digital bits.  What is at issue is the specific sequences of zeroes and ones
used to represent letters, numerical digits, punctuation marks, as well as
other symbols like dollar and yen signs or arithmetic symbols.

Y Means 10111001

The most widespread system, the American Standard Code for Information
Interchange, or Ascii, was approved as a standard in 1967.  Ascii (pronounced
AS-kee) represents each letter and symbol as a sequence of eight zeros and
ones.  The letter Y, for instance, is represented by the sequence 10111001.

But the International Business Machines Corporation has used a different code
in some computers, in which Y is represented by 11101000.  That means messages
sent from a non-I.B.M. computer to an I.B.M. machine must be translated.

And because Ascii cannot handle special characters used in other languages,
other countries have had to design their own codes.  Europe has its own 8-bit
code, and Asian countries like Japan and China have their own codes to
represent the thousands of different characters in their languages.

Ascii cannot be used to represent characters in all these languages because
there are only 256 different 8-bit sequences of zeros and ones.

The proposed standard, Unicode, would represent letters and symbols by a
sequence of 16 zeroes and ones, instead of eight, allowing for 65,536 different
combinations.  That is enough to give each character used in all the living
languages of the world its own unique sequence, with enough combinations left
over to eventually include obsolete scripts like cuneiform and hieroglyphs as
well.

With each character having a unique code, software programmers would no longer
have to worry about which standard was being used or have to translate from one
system to another.  That would make it easier, for instance, to develop a
word-processing program that works in many languages.

"You can now develop international software from the very beginning," said
David E. Liddle, the president of Metaphor Computer systems, a software company
in Mountain View, Calif.  The company's offices are serving as the headquarters
of the consortium, called Unicode Inc.

He said the American computer and software companies were able to put aside
their differences to work on the standard because all of them see their
overseas markets becoming more important.  "How many things are there in the
world where you can get Sun, I.B.M., Microsoft and Apple to agree?" he asked.

Standard Due in Spring

Unicode has been under development since 1989 by an informal group, and the
proposed standard, which now includes sequences for 27,000 characters, is
expected to be completed this spring.

Unicode's developers said they have done extensive research and consulted with
linguists.  One challenge was trying to represent all the Chinese ideographs
that are also used in Korean and Japanese.

But Unicode researchers found that the languages have more than 11,000 symbols
that are the same, allowing Unicode to represent all the Chinese ideographs
with only 18,739 unique characters, instead of the more than 31,000 that would
otherwise have been required.

The proposed code still faces hurdles.  The consortium needs to attract support
from foreign companies, and an international standards organization is
developing a competing code.  Using 16 bits instead of eight to represent each
character would also mean that computers would require more memory and
disk-storage capacity.

Problems could also arise in achieving compatibility between computers using
Unicode and those using older codes.
----------------------------------------------------------------