MLow post-filters¶
Encodings - mlow-postfilter
ENC-13 - status: draft - audio
The three deterministic decoder-side DSP post-filters MLow runs after range decoding: excitation harmonic comb, HP pitch comb, and harmonic post-filter.
Three enhancement filters run during decode. They consume no bits and have no wire
format. A decoder MUST run them in the order and with the constants below. All math is
single-precision (f32) with fast (non-strict-IEEE) arithmetic; a strict-IEEE decoder
reproduces output to within the i16 quantization step (1/32768), not bit-for-bit.
1. Excitation harmonic comb post-filter¶
Applied per subframe to the low-band excitation BEFORE LPC synthesis; output is ADDED
back into the excitation. Derives a 2nd-order pitch-resonant filter from the
excitation's own 3-lag autocorrelation (NOT the pitch lag). Subframe length N is 80
or 160.
Active-subframe path:
- Compute 3-lag autocorrelation
auto[0..2]of input, thenauto[0] += 9.999999960041972e-13. - Smooth into persistent
smoothed_c[i]withcoef = 0.4whenN == 160elsecoef = 0.16:smoothed_c[i] = coef*(auto[i] - smoothed_c[i]) + smoothed_c[i]. local5 = auto[0] * 0.1224999949336052 / smoothed_c[0]; scaled vector{local5*smoothed_c[0], 2*local5*smoothed_c[1], 2*local5*smoothed_c[2]}(lags 1,2 doubled).- Project through fixed
G_PITCH3x16 basis:proj[j] = sum_r scaled[r]*G_PITCH[r][j],peak = max_j proj[j],scale = 1.5*peak,refl[i] = scale - proj[i],comb_c[r] = sum_i G_PITCH[r][i]*refl[i]. - Fill
Nsamples of LCG noise (below), seedpitch_gainfromenv_stateon first call, RMS-envelope-smooth input with coef0.95, multiply noise by envelope. local5 /= (sum(noise^2) + 9.999999960041972e-13), thencomb_c[i] *= local5.
Inactive-subframe path resets smoothed_c and LCG state and derives a single scalar
gain from the band-energy ratio; it builds no comb coefficients.
Resonator: if comb_c[0] >= 0, add 1.0000000031710769e-30, run the 2-iteration
Levinson-style solve (returns r5, r8, denom). On success g = sqrt(comb_c[0]/denom)
and resonator FIR {g, r8*g, r5*g}; on failure {sqrt(comb_c[0]), 0, 0}; if
comb_c[0] < 0 resonator is zeroed. Run the 3-tap resonator FIR over env-shaped noise,
then static de-emphasis FIR {0.25, -0.49599999, 0.25} to produce the additive output.
A trailing AR1/MA1 biquad (corner from band energy, sigmoid trailing-pole
g5 = sigmoid(0.2*(nrgEnv[1]-nrgEnv[0]+1e-30) - 3)) is added UNLESS the subframe is
active with call_count > 1.
LCG noise fill: s = 196314165*s + 907633515 (wrapping i32), output
s * 8.100000115085493e-10, emitting byte-shifted views s<<8, s<<16, s<<24 in the
4-wide block; state persists across calls.
G_PITCH rows (16 columns each):
row0: 0.25 x16
row1: 0.24879618 0.23923509 0.22048031 0.19325261 0.15859832 0.11784916
0.07257116 0.02450428 -0.02450431 -0.07257118 -0.11784921 -0.15859832
-0.19325262 -0.22048034 -0.23923509 -0.24879618
row2: 0.24519631 0.20786740 0.13889255 0.04877256 -0.04877258 -0.13889259
-0.20786741 -0.24519633 -0.24519631 -0.20786738 -0.13889250 -0.04877260
0.04877260 0.13889261 0.20786740 0.24519633
2. HP (pitch) post-filter¶
Applied to one frame (FRAME_LEN = 320) of post-LPC-synthesis output. Chain:
de-emphasis (AR1 leaky integrator, coef {1, -0.995})
-> ARMA2 comb (MA2 numerator, AR2 denominator)
-> companion pre-emphasis (MA1 differentiator, coef {1, -0.995})
Comb keys on frame average pitch lag lag = sum(l^2)/sum(l) over subframe lags
(0 -> unvoiced). The ARMA2 biquad is built by smpl_calc_hp_coefs with f = 1/lag
(voiced) or f = 50/16000 (50 Hz corner) when lag <= 0:
cos_approx(x) = 1 - 0.5*x^2
coef_ma = { 1, -2*cos_approx(2*pi*maf*f), 1 }
far = arf[0]*f + arf[1]*f^2
rar = arr[0]*f + arr[1]*f^2
coef_ar = { 1, -2*cos_approx(2*pi*far)*(1+rar), 1 + (2*rar + rar^2) }
sc = (1 - coef_ar[1] + coef_ar[2]) / (1 - coef_ma[1] + coef_ma[2])
coef_ma *= sc ; unity DC gain
AR denominator is a resonance at angle 2*pi*far, radius 1+rar (rar negative for a
stable pole). Voiced curve: maf = 0.1, arf = {0.608057355, 0.070939485},
arr = {-2.187380512, 2.291030664}. Default curve: maf = 0.1,
arf = {0.728508218, 0.476039848}, arr = {-4.363803713, 8.441854006}.
When lag > 1.25*lag_old or 1.25*lag < lag_old, the decoder MUST run OLD and NEW
coefficients over the frame and overlap-add with the cos(omega)^2 down-ramp table
(HP_POSTF_TRANSITION_SPEED = 2, d_omega = pi/(2*(FRAME_LEN+1)), omega by repeated
addition) before the companion pre-emphasis. lag_old < 0 marks a fresh/reset filter.
3. Harmonic post-filter¶
Final per-packet stage; runs on the full low-band output after the HP filter. Mixes
x[-lag] + x[+lag], low-pass filtered by a lag-dependent kernel, and introduces total
group delay SMPL_TOT_POSTFILT_DELAY = 48 (8 feedback + 40 lag-subframe). Constants:
FRAME_LEN = 320 LAG_SUBFR_LEN = 40 FB_DELAY = 8
MIN_PITCH_LAG = 32 MAX_PITCH_LAG = 320 MAXPITCH_LEN = 320
FB_STRENGTH = 0.4734 STRENGTH = 0.6438 CUTOFF_HZ = 4000
NHARM_CUTOFF = 6.3 REDUCTION_FAC = 0.0579 LP_FILT_RES = 2500
PITCH_NUM_SUBFRAMES = 8
Operates per 40-sample lag block. Packet is appended to a persistent StateComb buffer at
offset MAX_PITCH_LAG + HARM_DELAY; reads index back into history. Per-packet feedback
strength fb_strength = 1 - FB_STRENGTH*normalized_bitrate. For a block with lag > 0:
y_harm[i] = comb[x+i-lag] + comb[x+i+lag] ; (lookforward-clamped at packet edge)
xy = dot(comb[x..], y_harm)
if xy > 0:
xx = nrg(comb[x..], L); yy = 0.25*nrg(y_harm, L)
strength = 0.5*xy / max(yy, xx)
high_lag_reduction = 1 - REDUCTION_FAC*((lag-MIN_PITCH_LAG)/(MAX_PITCH_LAG-MIN_PITCH_LAG))
strength *= high_lag_reduction * STRENGTH
y_harm *= 0.5*strength
diff = y_harm - strength*comb[x..]
lpcoefs = lp_filter(lag) * fb_strength ; 17-tap symmetric kernel
y_harm = MA17(diff) + comb[x - FB_DELAY ..] ; 48-delayed base, recursive
When xy <= 0 (or lag <= 0) the block copies the 48-delayed base; if the previous
block filtered, the first 2*FB_DELAY samples carry the previous kernel's zero-input
response. Per-bucket LP kernel is built by create_lp_filter from a cosine window
filt_win[i] = cos(omega)/(i+1) (omega stepping 0.5*pi/(FB_DELAY+1)), cutoff
omega_c = min(omega0*NHARM_CUTOFF, CUTOFF_HZ/16000*pi) with omega0 = 2*pi/lag, scaled
to unity sum; bucket index LP_FILT_RES/max(lag+30,80) - LP_FILT_RES/MAX_PITCH_LAG. After
processing, StateComb is shifted left by the packet length. Block lag for iteration k is
round(lags[k]); prev_lag carries across packets.
Notes. Recursive accumulation through the near-unit-circle pitch poles can drift up to ~1.5e-5 from strict IEEE (below the i16 LSB). The one larger residual is the first 48 samples of a silence packet following a voiced packet (the comb's zero-input response under the prior frame's coefficients).
Parent: mlow
Requires: mlow, mlow-excitation, mlow-synthesis, mlow-frame
Breakdown: mlow-decoder, mlow-synthesis
Implemented by
| Flavor | Status | Source | Notes |
|---|---|---|---|
whatsapp-rust |
working | history - blame - commits 674e851 |
all three filters ported and validated against the live WASM decoder and the C decoder dumps |
meowcaller |
partial | history - blame - commits 4323881 5b8d5a5 |
encodings codec modules are partial |
Annotation wacrg:ENC-13 — a flavor marks its implementation site in source with this comment; a script clones the source, finds it, and attaches the commit blame/permalink.
Contributors
| Contributor | Role |
|---|---|
| wrote initial spec |
protocol history / diff - blame
Open questions - The excitation comb carries a single unresolved 8/7 output scalar; its exact value is not yet pinned down. - Whether the excitation-comb subframe length selection (N in {80,160}) is fully determined by frame size or also depends on bandwidth mode.
References - MLow: WhatsApp's low-bitrate speech codec (engineering blog)
Changelog¶
- 2026-06-21 — Initial spec entry.