MLow frame and TOC¶
Encodings - mlow-frame
ENC-03 - status: review - audio
The leading "smpl" TOC byte of an MLow payload routes the frame (standard Opus vs. MLow), carries DTX/VAD flags, internal sample rate, and frame duration, and governs the three-chained-20 ms-subframe layout of an active MLow frame.
An MLow RTP payload begins with a single TOC byte b, followed by the
range-coded body. The TOC MUST be parsed first; it selects the decode path and
supplies the output length even when no body is decoded.
Routing. Inspect the top two bits first:
(b & 0xC0) == 0xC0 → standard Opus/CELT TOC (decode with stock Opus)
(b & 0xC0) != 0xC0 → smpl MLow TOC (decode with the MLow path)
When (b & 0xC0) == 0xC0, the remaining bits MUST be interpreted as a standard
Opus TOC per RFC 6716 §3.1; the frame is NOT MLow. Internal sample rate is fixed
at 16 kHz; frame duration comes from the Opus config config = b >> 3 (RFC 6716
Table 2): configs < 12 SILK {10, 20, 40, 60} ms; 12–15 Hybrid {10, 20} ms;
≥ 16 CELT {2.5, 5, 10, 20} ms (round 2.5 ms up to 3 ms).
smpl TOC bit layout (bit 0 = LSB), used when (b & 0xC0) != 0xC0:
bit 7 SID comfort-noise / DTX (silence-insertion descriptor)
bit 6 VAD voice-activity flag
bit 5 rate internal sample rate: 0 → 16000 Hz, 1 → 32000 Hz
bits 4:3 size frame-duration index into {10, 20, 60, 120} ms
bit 2 flag2 low-rate / config flag (selects active-frame config)
bit 1 enable voiced-enable bit
bit 0 flag0 reserved flag
Derived fields MUST be computed as:
sample_rate = (b & 0x20) ? 32000 : 16000
frame_ms = {10, 20, 60, 120}[(b >> 3) & 3]
sid = (b >> 7) & 1
vad = (b >> 6) & 1
voiced = vad AND ((b >> 1) & 1)
active = vad OR ((b >> 1) & 1)
Output length. MUST be sample_rate / 1000 * frame_ms for an MLow frame, and
16000 / 1000 * frame_ms for a standard-Opus-routed frame.
Inactive frames. If sid is set or active is false, the frame carries no
excitation: the decoder MUST emit output_length samples of silence (or comfort
noise) and MUST NOT decode an active body. A standard-Opus TOC MUST be routed
away from the MLow active-frame decoder.
Active-frame layout. An active MLow frame (typically 60 ms) MUST be decoded as
three chained 20 ms internal frames over a single range-coded body beginning at
byte offset 1. For each internal frame, the body MUST be read in order: LSF/LPC
indices, excitation pulses, then either the pitch/LTP block (voiced, i.e. LSF
stage-1 index == 1) or the unvoiced gains block. Each internal frame divides into
4 subframes; the voiced pitch block carries one lag per 40-sample block (8 per
internal frame, 24 per packet). flag2 ((b >> 2) & 1) selects active-frame
config 0 or 1, applied uniformly across all three internal frames. Cross-frame
predictor and synthesis history MUST persist across packets (the stream is
continuous); a decoder MUST reset this state only at a stream discontinuity.
Notes. In captured traffic the internal rate bit is 0 (16 kHz) and flag2 is 0; a
60 ms active frame produces 3 × 320 = 960 samples at 16 kHz before the per-packet
harmonic post-filter, then resized to the TOC-derived output length if they
differ. A range-coder desync after active-frame decode indicates a TOC/body
mismatch.
Parent: mlow
Requires: mlow, mlow-rangecoder, opus
Breakdown: mlow-decoder, mlow-encoder, mlow-lsf-lpc, mlow-noise, mlow-postfilter, mlow-red-fec, mlow-synthesis, mlow-vad
Implemented by
| Flavor | Status | Source | Notes |
|---|---|---|---|
whatsapp-rust |
working | history - blame - commits 674e851 |
— |
meowcaller |
partial | history - blame - commits b0fe93c e362783 7ff050a 2a4cd7c |
TOC parse + routing present; active-frame orchestration in progress |
Annotation wacrg:ENC-03 — a flavor marks its implementation site in source with this comment; a script clones the source, finds it, and attaches the commit blame/permalink.
Contributors
| Contributor | Role |
|---|---|
| wrote initial spec |
protocol history / diff - blame
Open questions - Semantics of flag0 (bit 0) — present in the TOC but not consumed by the decode path. - Whether the 32 kHz internal rate (bit 5 = 1) and the 10/20/120 ms frame sizes occur in live calls, or are reserved; only 16 kHz / 60 ms active frames are seen in capture. - Exact role of flag2 beyond selecting config 0 vs 1 for the active-frame decode.
References - RFC 6716 — Opus (TOC, §3.1, Table 2)
Changelog¶
- 2026-06-21 — Initial spec entry.