Session Initial Protocol (SIP) has been widely adopted for signaling and controlling interactive sessions in multimedia communication networks. Despite its various advantages compared to predecessor protocols, the security and privacy of the SIP remain challenges due to the risk of real-world public networks. While most SIP applications utilize end-to-end communications, existing studies mainly focus on client-server protocols. In this study, we propose a novel SIP authenticated key agreement protocol for all user-server, user-user, and group communications. An end user employs a short-term token to communicate with either end-users or multimedia servers without connecting to a trusted server. Our security analyzes show that the scheme not only resists all known attacks, but provides the system with many desirable features, including direct end-to-end communications, preserving biometric template privacy, user access control, smart card revocation, and long-term secret updates. The latency of the authenticated key agreement phase is relatively small, and thus this signaling protocol is appropriate for a wide range of real-time applications.