Skip to content

RTP framing

Relay - rtp-framing

REL-05 - status: review - audio, video

RTP framing for SRTP-protected Opus media: the 16-byte speech and 20-byte DTX/piggyback headers (ext profile 0xdebe), payload classification, and send-side sequencing.

Protected media MUST use RTP version 2. Audio payload type is 120 (Opus); a receiver MUST also accept 121. Byte 1 = marker << 7 | (payloadType & 0x7f). CC MUST be 0; P MUST be 0. All multi-byte fields MUST be big-endian.

Header shape. Every packet MUST carry exactly one of two headers. Speech uses the 16-byte header (X=0); DTX or warp-piggyback uses the 20-byte header (X=1). The 0xdebe extension profile tag MUST be emitted on every header; the 16-byte form has extension length 0 (no extension block), the 20-byte form has length 1 word.

16-byte speech header:

0               1               2               3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X=0| CC=0  |M|     PT      |       sequence number       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             SSRC                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       0xdebe (ext profile)    |     ext length = 0 words      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

20-byte DTX / piggyback header (16-byte header with ext length = 1 word, then one 32-bit extension word):

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       0xdebe (ext profile)    |     ext length = 1 word       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       extension word                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Extension word (20-byte header only): - DTX (comfort-noise): MUST be 0x30010000. - Warp-piggyback: MUST be the piggyback word for that packet (see warp).

Payload classification. Select header shape and marker from the Opus payload: - DTX/comfort-noise when: single byte 0x10, 0x88, or 0x90; or 2–15 bytes with first byte b0 where (b0 & 0xf8) == 0x08 or b0 == 0x0a; or ≤ 6 bytes with (b0 & 0xf0) == 0x30. MUST use the 20-byte header. - Opus priming frame when it equals one of the fixed priming frames. - All other payloads are speech and MUST use the 16-byte header. - Priming and DTX payloads MUST NOT latch the speech marker.

Sequencing (send side). A stream MUST start sequence number at 1 and timestamp at 0. Each packet MUST advance the sequence number by 1 (mod 2^16) and the timestamp by samplesPerPacket (mod 2^32). The marker bit MUST be set on the first speech packet (speech onset); priming/DTX packets before speech MUST NOT set the marker or latch the onset. Subsequent speech packets MUST NOT set the marker unless the caller explicitly requests it.

Wire-size estimation. Implementations MAY estimate on-wire SRTP size as headerSize + payloadLength + authTagLength, where headerSize is 16 (speech) or 20 (DTX), and authTagLength (see srtp-hop-by-hop) is the short 4-byte tag for DTX and short speech packets (priming frames or payloads ≤ 18 bytes), else the full 10-byte tag.

Notes. On parse, only the fixed 12-byte RTP fields are decoded; total header length is computed from version, CC, and (when X=1) the extension length, so the payload offset is found without interpreting the extension word.

Mlow Opus speech frames are recognisable by first byte (20 ms 0x48..0x4f, 60 ms 0x50..0x57) on payloads ≥ 18 bytes; this does not change the 16-byte header.

Requires: warp, srtp-hop-by-hop, opus, ssrc
Breakdown: video-packetization, media-loop, rtcp

Implemented by

Flavor Status Source Notes
whatsapp-rust working history - blame - commits 674e851
zapo-caller working ported to whatsapp-rust from the zapo-caller src/media/rtp.ts framing

Annotation wacrg:REL-05 — a flavor marks its implementation site in source with this comment; a script clones the source, finds it, and attaches the commit blame/permalink.

Contributors

Contributor Role
Rajeh Taher Rajeh Taher wrote initial spec

protocol history / diff - blame

Open questions - Whether payload type 121 is used for a distinct media variant or is only accepted on receive. - The full set of warp-piggyback extension words and the packet index at which piggybacking begins.

References - RFC 3550 — RTP - RFC 8285 — A General Mechanism for RTP Header Extensions

Changelog

  • 2026-06-21 — Initial spec entry.

Back to the full spec