Media loop and session¶
Relay - media-loop
REL-07 - status: draft - audio, video
Per-call session state machine plus the outbound (protect) and inbound (unprotect) audio frame pipelines: RTP framing, E2E-SRTP, and WARP tagging.
Session state¶
A session MUST hold at minimum: call id, peer JID, call-creator JID, direction (outgoing | incoming), and current phase. Phase MUST be one of:
Idle | Calling | Ringing | Connecting | Active | Ended
Outgoing sessions MUST start in Idle; incoming sessions MUST start in Ringing.
The idempotent self-transition x → x MUST be accepted. The only other legal
transitions are:
Idle → Calling ; outgoing only
Calling → Ringing
Ringing → Connecting
Connecting → Active
<any phase except Ended> → Ended
All other transitions MUST be rejected as a no-op. Ended MUST be terminal. An
incoming session MUST NOT transition to Calling. Media MAY flow only in Active.
Media pipeline keying¶
Derive two independent E2E-SRTP key sets from the call key (see
srtp-master-key): send keys keyed by the sender's own
participant id, recv keys keyed by the peer's participant id. The HKDF info
for the send direction MUST be the sender's own participant id (the peer derives
its recv keys from that id). Both JIDs MUST be normalized with the E2E-SRTP
participant-id rule (see ssrc) before derivation. The two directions MUST
NOT be inverted.
Outbound loop (protect)¶
For each audio frame to send, in order:
- Obtain the next RTP header from the send sequencer. RTP sequence number starts
at 1 and increments by 1 per packet; RTP timestamp advances by
samples_per_packet(960 for a 60 ms frame at 16 kHz); payload type MUST be120(Opus). - Advance the rollover counter (ROC) for the new sequence number.
- Encode the RTP header bytes.
- E2E-SRTP encrypt the Opus payload under the send keys, using the header SSRC, sequence number and ROC as keystream inputs (see srtp-e2e).
- Concatenate
rtp_header || encrypted_payload. - Append the 4-byte WARP message-integrity tag, computed over that concatenation under the send auth key with the ROC (see warp).
Send the result as one binary message on the relay media channel (see stun-relay, call-transport).
Inbound loop (unprotect)¶
For each relay packet classified as RTP (see rtp-framing):
- Reject if shorter than
12 + 4bytes (min RTP header + WARP MI tag). - Strip the trailing 4-byte WARP MI tag.
- Parse the RTP header, compute its byte length; reject if no payload follows.
- E2E-SRTP decrypt the remaining payload under the recv keys, using the parsed SSRC, sequence number and ROC.
The decrypted result is the Opus payload for the decoder (see opus).
Codec parameters¶
Audio MUST be Opus, mono, 16 kHz, 60 ms frames (960 samples per packet), VoIP
application mode. Reference encode: 25 kbps, complexity 9. Payload type 120 on
send; 120 and 121 both recognized as Opus on the inbound path. Priming frames are
fixed constant payloads and MUST bypass the encoder (see rtp-framing).
Notes. The inbound path strips but does NOT verify the WARP MI tag in the reference composition. The proprietary "mlow" encode (see mlow) is the on-wire WhatsApp codec; standard libopus at the reference parameters is accepted by the peer.
Requires: srtp-master-key, srtp-e2e, warp, rtp-framing, ssrc, opus, call-transport, stun-relay
Implemented by
| Flavor | Status | Source | Notes |
|---|---|---|---|
whatsapp-rust |
working | history - blame - commits 674e851 |
session state machine and protect/unprotect pipeline composition; live relay flow over the channel is deferred |
zapo-caller |
working | — | signalling + crypto + relay loop |
Annotation wacrg:REL-07 — a flavor marks its implementation site in source with this comment; a script clones the source, finds it, and attaches the commit blame/permalink.
Contributors
| Contributor | Role |
|---|---|
| wrote initial spec |
protocol history / diff - blame
Open questions - Full inbound ROC-recovery (rollover) algorithm for out-of-order and wrapped sequence numbers. - Whether the WARP MI tag is verified on receive in the production client, and the failure policy if it is. - Send-side pacing / DTX and comfort-noise behavior during the Active phase. - Exact retransmission/jitter-buffer handling on the receive path.
References - RFC 3711 — SRTP - RFC 3550 — RTP - RFC 6716 — Opus
Changelog¶
- 2026-06-21 — Initial spec entry.