Skip to content

MLow CELP excitation

Encodings - mlow-excitation

ENC-10 - status: draft - audio

Decode the range-coded CELP excitation of one MLow internal frame: per-subframe pulse counts and positions/signs, plus the gains block (unvoiced) or pitch/LTP block (voiced).

Decode the excitation of one internal frame (320 samples at 16 kHz, p3 = 4 subframes) from the range coder immediately after the LSF/LPC parameters (mlow-lsf-lpc) and before synthesis (mlow-synthesis). All reads use the Opus/CELT range coder (mlow-rangecoder): CDF symbols from the front, raw uniform bits from the back.

The LSF stage-1 selector s1 selects the mode (mutually exclusive per frame): s1 == 1 (voiced) MUST decode the pitch/LTP block; s1 == 0 (unvoiced) MUST decode the gains block.

Pulses (algebraic codebook)

Establish the frame pulse budget. Let config be the band/config flag (p6; 0 = narrowband) and idx = p4 + s1 (p4 = regular-frame flag). Look up:

SMPL_PULSE_COUNT_BYTE = [80, 160, 160, 16, 32, 32, 0, 0]   ; index = config*3 + idx

Then frame_budget = budget_byte * p2 / 320, subfr_budget = frame_budget / p3.

Total pulse count. For config != 0, read the total with a config-indexed CDF. For config == 0, draw the total from a triangular prior over [0, frame_budget]: with L = frame_budget,

T(k) = ( (k+2)*(L+1) - ((k-1)*(k+131070) >> 1) ) & 0xffff

ft = max(T(L), 1). Read val = range_decode(ft), scan k = 0,1,2,... up to frame_budget, select the first k with T(k-1) <= val < T(k), and finalise the symbol over that interval. Result = total pulse count.

Split. With p3 == 4, split the total into four per-subframe counts by two binary-split levels. Top level: read a bias-corrected CDF (when the constrained interval is non-empty) for the first-half sum. Then split each half with smpl_split: for a half carrying count pulses over granularity positions, min_split = max(count - granularity, 0), lo = min(count, granularity); when min_split < lo read a CDF symbol over lo - min_split + 2 entries and add to min_split, else count = min_split. Four results = subfr[0..4].

Positions and magnitudes. For each subframe with cnt > 0, read run-length pulse positions over the p2/p3 positions. Repeatedly read a position-delta CDF of length pos + 1 (remaining positions). A non-zero delta (or the subframe's first read) starts a new magnitude-1 pulse at the advanced position; a zero delta on a subsequent read increments the current pulse's magnitude (stacked).

Signs. After all positions, read one sign bit per pulse position as raw uniform bits, batched at most 15 bits per read. Sign is +1 when set, -1 otherwise, applied to that position's magnitude. Scatter signed magnitudes into the excitation vector at their sample positions; all others are 0.

Gains (unvoiced frames)

Read a main log-gain (CDF, 85 entries) and a delta log-gain (CDF, 99 entries), then reconstruct per-subframe quantized log-gain:

base = gain_main * gain_scale[cfg] - 0x154000
gain_q[sf] = base + (gain_cb[sf + p3*gain_delta] << 4)

gain_scale and gain_cb are config-selected tables. For each subframe with cnt > 0, read an energy-residual symbol from a bucketed CDF (92 entries): bucket = min(cnt/10, 3) (i.e. 3 for cnt >= 30); the CDF base is shifted by -2 * min(g, 0) where g = clamp((gain_q[sf] + 8192) >> 14, >= -85). Subframes with no pulses produce no read and residual 0.

Pitch / LTP (voiced frames)

Decode in order: LTP gains-and-filters, then the pitch lag.

LTP gains and filters. For each of p3 subframes, read a gain index from a 17-entry CDF whose row is selected by the previous subframe's gain index (decoder state). Using a config-selected gain-weight table, the running accumulator adds w0 + 2*w2 per subframe; avg_gain = gain_accum / p3 is retained for fractional-lag selection. For each subframe with pulses, read a 35-entry filter index: from the base CDF when no previous filter index exists (state == -1), else from a CDF row offset by the previous filter index. Gain and filter indices update decoder state.

Lag. A pitch-configuration block supplies the contour count, lag CDF, contour map, fractional-lag table, and delta CDF. Read the primary lag absolutely (CDF over num_contours + 1) when no previous lag exists, else as a delta: a 10-entry delta CDF selects an interval [lo, hi], and lag = lo + symbol read from the lag CDF over hi - lo + 2 entries. Search the contour map for i where contour_map[i] == lag + 1; if none or out of range, the pitch block ends.

From the selected contour, read a contour base lag. Unless a previous lag exists with -1 <= (base_lag - prev_lag) < 3, read a 64-symbol fine lag and combine as cur_lag = (base_lag << 6) + fine (Q6, 1/64-sample, first contour segment). For each remaining segment, read a 65-entry fractional CDF (segment selected by avg_gain: 0 for < 10007, 1 for < 14085, else 2) and accumulate into the running Q6 lag. Replicate each segment's Q6 lag across the contour-defined number of 40-sample blocks to produce per-block pitch lags (lag = block_lag * 0.5 + SMPL_MIN_PITCH_LAG). Decoder state MUST retain the previous gain index, filter index, integer lag, and fractional lag.

Parent: mlow
Requires: mlow, mlow-rangecoder, mlow-lsf-lpc, mlow-synthesis
Breakdown: mlow-decoder, mlow-noise, mlow-postfilter, mlow-synthesis

Implemented by

Flavor Status Source Notes
whatsapp-rust working history - blame - commits 674e851
meowcaller partial history - blame - commits 45efc4c 011af47 2d36efc 81bebc0 a239706 8e4f5c8 d8dfcac excitation decode KAT-verified against the Go reference; CELP synthesis wiring in progress

Annotation wacrg:ENC-10 — a flavor marks its implementation site in source with this comment; a script clones the source, finds it, and attaches the commit blame/permalink.

Contributors

Contributor Role
Rajeh Taher Rajeh Taher wrote initial spec

protocol history / diff - blame

Open questions - Wideband (config != 0) total-pulse-count CDF table location is referenced but the active captures are all narrowband, so the WB path is not exercised by vectors. - Exact contents and addressing of the config-selected gain-scale and gain-codebook tables beyond the heap window reproduced for the active config. - Whether the gains block is reached on any real (unvoiced) capture frame; current vectors force-run it on voiced decoder state to validate the arithmetic.

References - RFC 6716 — Opus (range coder)

Changelog

  • 2026-06-21 — Initial spec entry.

Back to the full spec