[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Normalisation and matching
Adam M. Costello wrote:
>Guess what recommendation I would argue for? Minimal restrictions. I
>would recommend that IMA-aware protocols allow all valid IMAs (including
>non-Nameprepped ones) in whatever charsets they want to support (and
>they should at least support UTF-8, as recommended by BCP-whatever).
>Applications should not apply Nameprep or NFC or anything before
>putting the mail address into the slot; they should leave that to the
>receiver to do if necessary. (It will be necessary if receiver wants
>to compare the address, or relay it into a IMA-unaware slot, but not if
>the receiver merely wants to relay it into an IMA-aware slot, or display
>it).
I am sure you will find that a lot of software capable of displaying
UCS will fail when it is not normalised, and also for things like
full width or circled forms.
>
>I see two advantages to this approach. First, it allows presentational
>details to be preserved (like fullwidth characters, superscript
>characters, sharp-s, etc).
Normalised does not mean that that goes away. If you use NFC all is
preserved. But to simplify character handling only ONE representation
of a character should be allowed. This does not mean NFKC - it unfortunately
does more than that. I want sharp-s and masculine ordinal indicator
to be preserved.
I do not want full width characters as not all letters can be full width and
it is just a second encoding of the standard width letter, nor do I want
ligatures. I prefer simple forms.
>
>Second, it reduces superfluous computation at the sending end in
>two cases. Case 1: If the receiver doesn't need the string to be
>Nameprepped, then it would be a waste for the sender to apply Nameprep.
I do not want it to be nameprepped. I want it to be normalised and
no multiple represenations of the same character.
>So the approach I would recommend is to let applications take
>responsibility for applying Nameprep when they themselves need it, don't
>depend on other applications to pre-apply it for you, and don't bother
>trying to pre-apply it for someone else.
>From all I have read the best thing is if sender does normalisation, not
receiver. It is often easy during input to normalise without overhead.
To write code that can normalise every time you get data before it can
be usable will cost a lot more.
IRI uses NFC for this reason.
Dan