I would be happy to save some bytes during a VoIP call setup too.
I don't know what context you're referring to but there's an enormous
difference between per-packet overhead and one-time overhead. Any
reasonable media plane protocol is going to push >> 600 bytes over
its lifetime.
And as others have stated, it complicates things to optimize this.
You are well aware of the risks in complicating security protocols.
Yes, but we have the expertise too, thankfully.
I'm not convinced we do, actually. The implications of having only
occasional binding of the signalling to the media strike me as
quite difficult to analyze--not least the question of how a
reasonable implementation would know which one to do.
Anyway, I will sign-off here on this. I am beginning to get the sense
that we all understand the use case and its applicability.
Well, I don't, at least if the implication is that you don't need to
authenticate both sides. On the contrary, as I've observed several
times, even where the callee has some in-band authentication
mechanism, it's desirable to cryptographically bind the media to the
signalling.