[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Fwd: Re: Rationale wanted for Unicode identifier rules



Is this something we can use (possibly modified) in IDN to describe what a reasonable character set for IDN labels is?

Harald



X-UML-Sequence: 12492 (2000-03-01 21:35:45 GMT)
From: Kenneth Whistler <kenw@xxxxxxxxxx>
To: "Unicode List" <unicode@xxxxxxxxxxx>
Cc: unicode@xxxxxxxxxxx, kenw@xxxxxxxxxx
Date: Wed, 1 Mar 2000 13:35:44 -0800 (PST)
Subject: Re: Rationale wanted for Unicode identifier rules

John Cowan asked:

>
> Kenneth Whistler wrote:
>
> >   A. Identifier syntax along the lines described in Unicode 3.0.
>
> Can you (or someone) supply a precis of this to the poor fellow
> who still hasn't heard from his bookstore's order department?
> Especially if it is indeed simpler than the Unicode 2.0 version?

Sure. For those of you who already have the hymnal, turn to page 134 to
sing along.

<identifier> ::= <identifier_start> (<identifier_start> | <identifier_extend>)*

<identifier_start> is defined by an equivalent category set consisting of
       all those characters with the General Category values:
       Lu, Ll, Lt, Lm, Lo, Nl

<identifier_extend> is defined by an equivalent category set consisting of
       all those characters with the General Category values:
       Mn, Mc, Nd, Pc, Cf

Thus, identifiers can start with any "letter" or "letter number".

Identifiers can continue with any "letter" or "letter number", any combining
mark (except the symbolic surrounds), any decimal digit, any connecting
punctuation, or any format control character (e.g. the invisible bidi
layout controls, ZWJ, ZWNJ, etc.).

Note that this definition explicitly excludes the following General Category
values from identifiers:

Me, No, Zs, Zl, Zp, Cc, Pd, Ps, Pe, Pi, Pf, Po, Sm, Sc, Sk, So

i.e. enclosing combining marks, "other numerals", all spaces, control
characters, all other punctuation, and all "symbols".

--Ken

-- Harald Tveit Alvestrand, EDB Maxware, Norway Harald.Alvestrand@xxxxxxxxxxxxxx