From: Clive D. W. Feather (clive@on-the-train.demon.co.uk)
Date: Tue Jul 03 2001 - 17:41:48 CDT
In message <GFw5K7.3AG@clw.cs.man.ac.uk>, Charles Lindsey
<chl@clw.cs.man.ac.uk> writes
>> In English, o-dieresis is two characters normally rendered as one glyph
>> In French, c-cedilla is one character and one glyph
>> In Arabic, "ibn" might be three characters that are rendered as three
>> glyphs in some contexts, but as one in others
>
>That was looking good until you got to the arabic bit :-( . But would it
>always be three glyphs after going through NFKC?
Yes, it would.
I'm not going to try to sketch this out in ASCII, but the point is that
Arabic letters merge together and change form when combined. The single
glyph for "ibn" looks nothing like the individual ones for "i", "b", and
"n". But it is a single glyph, and it occupies one space on a screen.
The NFKC form is the three *characters*, but these generate one *glyph*
when they occur together.
The nearest analogy I can find in English is the idea that "foetus"
would be five glyphs ("f", "oe", "t", "u", and "s") if rendered
properly, even though it's made up of six characters.
>>I suspect we may want to steer well clear of this confusion.
>
>It depends whether it is less confusing than other alternatives on offer.
>It is certainly less confusing than the two defined usages of "grapheme".
Perhaps. OTOH, I suspect something like "marked-character" better
represents what we want to say.
-- Clive D.W. Feather | Internet Expert | Work: <clive@demon.net> Tel: +44 20 8371 1138 | Demon Internet | Home: <clive@davros.org> Fax: +44 20 8371 1037 | Thus plc | Web: <http://www.davros.org> Written on my laptop; please observe the Reply-To address