From: Charles Lindsey (chl@clw.cs.man.ac.uk)
Date: Wed Aug 07 2002 - 06:03:54 CDT
In <Pine.LNX.3.91.1020806190409.10567F-100000@darkstar.prodigy.com> Bill Davidsen <davidsen@prodigy.com> writes:
>On Tue, 6 Aug 2002, Charles Lindsey wrote:
>> Well my sendmail, configured more or less as out of the box, had no
>> problem sending to Andrew's address. However, I see that some Earthlink
>> MTA has rejected it, and also a Linone one (using exim apparently).
>>
I am surprised that noone else has come back on this one. Russ? Andrew?
Or do we take the view that the servers that do these stupid things are
broken and can safely be ignored?
>> So what do we do? Look at some oither character?
>Can someone make a QUICK suggestion? I'm looking at my keyboard, and I
>see characters not in my sendmail.cf, but I don't feel confident they
>aren't magic elsewhere in mail, if only as problem causers.
>My mail goddess has gone home, all I can do is test against a few
>mailers, none of which objected to ` in an address. Don't take that as a
>recommendation, someone probably doesn't like that either :-(
OK. First of all we require a printable ASCII character, and it MUST NOT
be a valid character in a newsgroup-name, nor must it be a ',' or a '.'.
So that gives us:
! " # $ % & ' ( ) * / : ; < = > ? @ [ \ ] ^ ` { | } ~
Now we remove those which are not legal in a local-part (i.e. in an atom,
because we don't want to be using quoted-strings) (RFC 2822). That
leaves:
! # $ % & ' * / = ? ^ ` { | } ~
'!' (uucp) and '%' (% hack) go because theye are meaningful to some mail
systems. But I think the rest are safe from that POV.
# $ & ' * / = ? ^ ` { | } ~
Now we need to exclude those that are likely to get escaped in URIs (the
last thing we want is escaped escape characters). These things will
certainly get into mailto URIs, and may get into news URIs as well.
RFC2396 is somewhat vague here, because different components of a URI have
different requirements, and some bofhish systems tend to escape everything
whether necessary or not. But excluding the characters defined as "delims"
or 'unwise' in RFC2396 reduces us to
$ & ' * / = ? ^ ~
And then likely extensions to the news URI, allowing a server name to be
given, will surely make use of '/'; and according to RFC 1738 '*' has a
special meaning in news URIs.
$ & ' = ? ^ ~
Now I want to exclude '?', because it is one of the characters that
introduces an RFC 2047 encoding, and we don't want our encodings to be
confused with those, do we? Yes, I know that '=' also occurs in RFC 2047
alongside the '=', but removing one of them is sufficient. Also, I still
have lingering doubts over '?' in URIs, though it _should_ be unambiguous
there.
$ & ' = ^ ~
For aesthetic reasons, I would remove "'" and "^" (they are inconspicuous,
and people may use "'" to quote the complete newsgroup-name in texts,
leading to mismatched 's.
$ & = ~
Of these I think '=' has to be the front runner, since it is already
widely used to introduce hexadecimal encodings (quoted-printable and RFC
2047), so people seeing it for the first time will likely guess what it
means. So are there any gotchas?
Well, if one of these things ever gets passed through quoted-printable,
the '=' will get munged, but that only affects people who look at raw q-p,
and user agents aren't supposed to show you that.
So '=' would be my suggestion. I could easily change the encoding we have
defined in 5.5.2. All we would lose if the (some what spurious)
compatibility with RFC 2396.
Do you want me to do that?
-- Charles H. Lindsey ---------At Home, doing my own thing------------------------ Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5