Triggered by Rob Raymond's email thread,
http://www.imc.org/ietf-rtpsec/mail-archive/msg00358.html, I have been
thinking about requirement R10 in
draft-wing-media-security-requirements-00, which was derived from
section 4.1 of draft-wing-rtpsec-keying-eval-02:
R10: Endpoint identification when forking. The Offerer must be
able to associate answer with the appropriate flow endpoint.
In case of forking one might not want to perform a DH with
every party but instead to associate the SDP response with
the right end point. This is a performance related
requirement.
In the cases of endpoints that use symmetric RTP
(draft-wing-draft-wing-behave-symmetric-rtprtcp) and there are no
NATs, associating media and signaling is trival, even in the forking
case: the SDP answer contains the transport address of the media
stream (in the c= and m= lines) that will soon be arriving. I expect
in 2007 it is fair to say that everyone implements symmetric RTP. So
if there are no NATs, this isn't a problem for RTP or SRTP.
Where there are NATs, the transport address indicated in the SDP
answer might not be where media is received from. However, ICE
provides a mechanism to associate the media stream with the SDP
answer using the STUN username (see section 12.3 of
draft-ietf-mmusic-ice).
Initially I thought we could just delete requirement R10.
However, there are some attacks related to this that can be done
against SRTP itself or against a media-path key mechanism. I would
like these considered as requirements.
Today, these related attacks probably fall under the large "for
further study" section of draft-wing-media-security-requirements-00,
specifically the "A solution SHOULD consider active attacks" item.
The attacks are:
1. DoS protection against off-path attackers. This is where an
off-path attacker initiates a DTLS-SRTP, MIKEYv2, or ZRTP
exchange with the called party. With the current design of all
of these protocols, such an attack is trivial to launch and is a
viable attack until the SDP answer arrives. This attack could
cause the calling party to burn CPU time performing DH or PK
computations. Once the SDP answer arrives, the calling party
may be able to thwart the attack by only processing
DTLS/MIKEYv2/ZRTP messages coming from the transport address of
the called party. The risk is that between the offer being sent
and the answer arriving, the calling party is at risk to a
denial of service attack.
A related attack is to just send SRTP packets at an endpoint.
The SRTP packets will never authenticate, but the receiver will
have to perform a SHA1-HMAC to determine authentication failed.
2. Attackers sending bogus SRTP-encrypted media packets. This is
the attack described by Rob Raymond in his earlier thread. In
this attack, an attacker successfully establishes an SRTP
session with the calling party and plays media. This is a
viable attack until the SDP answer arrives, at which time the
calling party can compare information from the signaling with
information in the media path. The best example of such a
comparison is RFC4572, "Comedia over TLS in SDP", with the
a=fingerprint in SDP and the certificate in the media path.
As discussed in other threads, the ultimate protection against
this attack -- using the tools available to us now -- is for the
answerer to have his SDP signed with connected-identity.
If we resolve attack 1, attack 2 is still possible if the attacker is
on an unencrypted signaling path, has control of a SIP proxy, or
(probably) if the attacker is on the media path.
I expect that the solution to attack 1 will involve sending some sort
of nonce in SDP, which is then sent in the first media-level hello
packet. By doing so, SRTP receivers can safely ignore (drop) all
other SRTP packets from other source transport addresses, and only
attempt to authenticate and decrypt SRTP packets from new transport
addresses if they start with a media-path hello and the correct nonce.
I know that some RTP implementations implement filters such as this to
thwart some RTP attacks.
To defend against attack 2, a proof of possession has to be performed
which is computationally expensive (requires DH or PK operation). It
would be desirable to require an attacker to be on path in order to
waste this CPU time.
Based on the above discussion I suggest we delete the existing R10 in
draft-wing-media-security-requirements.
I would like to add the following new requirements to
draft-wing-media-security-requirements:
R12: A solution MUST protect both endpoints from CPU attacks
from off-path attackers (that is, attackers that can't see
the SIP signaling and aren't on the media path). The
offerer MUST be protected without waiting for the SDP
answer to arrive.
R13: A solution SHOULD include proof of possession.
R14: It MUST be possible for the endpoints to authenticate the
media-path key exchange using information in the SDP, should
the endpoints choose to do such authentication.
Comments?
-d