MLow and the audio media plane¶
WhatsApp's 1:1 call audio is carried by an in-house codec the binary calls MLow (a CELP speech codec with an optional neural "companion" post-filter), wrapped in a receive pipeline that is, structurally, WebRTC's audio stack renamed. This page reconstructs that pipeline from a single static read of the WhatsApp Web calling engine (Emscripten WASM), using warden to mine the binary's own type information.
Confidence. Everything here is from one technique, static
wasm-analysis, so by the corroboration rule it is at mostprobable, neverconfirmed. The structure (which component does what) rests on the binary's own C++ RTTI and source-path strings and is gradedprobable. The inner codec algorithm (bitstream layout, exact DSP) is only partially recovered and is gradedspeculative; it lives in open questions. No key material or captured media is in this repo.Provenance. Module: WhatsApp Web calling engine
wa.wasm, SHA-13638a506b4055c2fc6bec75edff18512ca79fe64(9,819,554 bytes). Techniquewasm-analysis· toolwarden· contributorpurpshell· sources: commitsaa0996c,365daa6and the machine-readable identity map in the warden knowledge base (rendered in the function map). Method and exact queries: methodology.
Why this is recoverable at all¶
The mobile apps strip symbols, but the Web WASM still carries two kinds of ground-truth strings that survived the build:
- C++ RTTI type names (Itanium-mangled, e.g.
N8facebook3rtc9MLowFrameE→facebook::rtc::MLowFrame). A function that references a class's typeinfo is constructing, destroying, ordynamic_cast-ing that class, so it pins the function to a concrete type. __FILE__paths baked into asserts/logs (e.g.xplat/wa-voip/wacall/media/src/codec/wa_opus.cc). These map a function to its exact source compilation unit.
Neither depends on a model guess. They let us correct the many auto-generated names that were wrong (see function map) and name the subsystems with confidence.
The receive (decode) pipeline¶
flowchart LR
RTP[SRTP/RTP audio packet] --> RED[MlowRedPayloadSplitter\nRED redundancy split]
RED --> RS[ReedSolomonCode\nFEC recovery]
RS --> NETEQ[concerto::NetEq\njitter buffer + PLC + resample]
NETEQ -->|registered decoder| DEC{DecoderDatabase}
DEC -->|primary| MLOW[AudioDecoderMLowImpl\nMLow CELP decode]
DEC -->|alternate| OPUS[wa_opus.cc\nOpus decode]
MLOW --> COMP[mlowcompanion\nneural post-filter optional]
MLOW --> PCM[PCM out]
COMP --> PCM
OPUS --> PCM
Each box is a real type or source file recovered from the binary; the function indices backing each are in the function map.
concerto::MlowRedPayloadSplitter: splits an RTP payload that carries RED-style redundancy (a primary frame plus one or more older frames for loss resilience).concertois the binary's name for its WebRTC-derived media core. (probable)facebook::rtc::ReedSolomonCode/ReedSolomonFactoryImpl/RSEncoderDecoder: a Reed-Solomon erasure code over the redundancy group, alongside WebRTC's ownmodules/rtp_rtcp/source/multistream_forward_error_correction.cc. The RED layer is FEC-protected, not bare duplication. (probablethat it is Reed-Solomon from the class names; the exact RS parameters arespeculative.)concerto::NetEq*: the receive buffer and concealment engine. The class set is verbatim WebRTC NetEq:NetEqImpl,NetEqController,DecoderDatabase,DelayManager,DelayPeakDetector,PacketBuffer,BufferLevelFilter,TimestampScaler,Expand/ExpandFactoryandMerge(the packet-loss-concealment generators), andStatisticsCalculator. This is WhatsApp's fork of WebRTC's audio NetEq, renamed toconcerto. (probable, near-certain from the identical internal class names.)concerto::DecoderDatabase: the pluggable decoder registry. MLow is the primary registered decoder; Opus (wa_opus.cc) is present as an alternate. (probable.)facebook::rtc::AudioDecoderMLowImpl: the MLow decoder itself, a CELP speech decoder (LPC synthesis with long-term/pitch prediction and an entropy-coded excitation). Its inner bitstream is the part still being recovered. Config it reads includesmlow_dec_cutoff_hzand theWebRTC-MLowDecoder-lowPassCutoffFrequencyHzfield trial. (decoder identityprobable; algorithmspeculative.)mlowcompanion_*: a small neural post-filter ("companion") whose weights (mlowcompanion_af1_kernel_bias,mlowcompanion_fnet_tconv_bias, and so on) live in the data section and run on ExecuTorch/XNNPACK (the binary embedsXNNCompiler.cppandexecutorchop_*kernels). It is out of scope for the reference implementation; intelligible audio is reachable without it. (existenceprobable; everything else out of scope.)
The send (encode) side¶
The encoder is present (the binary both sends and receives MLow), but it exposes
less RTTI than the decoder, so it is less pinned today. The codec-selection and
framing source files are identified (hybrid_codec.cc, wa_opus.cc,
codec_utils.cc, wa_audio_transformation.cc), and the RED/RS path is shared
with receive. Encoder internals are tracked in
open questions.
Adjacent layers (named, not in scope here)¶
The same mining surfaced the layers around the codec, documented so the codec boundaries are unambiguous:
- E2EE media:
facebook::rtc::e2ee::FrameDataHandlerGeneric/H264,KeyAccessTracker,FrameStatsCollector: per-frame end-to-end media encryption (SFrame-style), distinct from the hop-by-hop SRTP in media-srtp. - Congestion control:
concerto::PacketPairBweV3,AimdRateControl*,DelayBasedBwe: bandwidth estimation that drives MLow's target bitrate. - Platform glue:
whatsapp::wasm::WasmAudioDriverand the capture/playback drivers: the JS-boundary audio I/O, not the codec.
What this changes for wacrg¶
media-srtp listed "confirm the audio codec" as the
top open question and marked the whole plane speculative. Static
wasm-analysis now answers the structural part of it at probable: the codec
is MLow (CELP + optional companion NN) with Opus as an alternate, carried over an
RED + Reed-Solomon FEC layer into a WebRTC-NetEq receive engine. Promoting any of
this to confirmed needs a second, independent technique: a Frida hook or a
live media capture that observes the same frames, per the
corroboration rule.
Open questions¶
These are the gaps between "we can name the component" and "we can re-implement
it bit-exactly." They are speculative until recovered.
- MLow bitstream layout. Frame header/TOC and how subframes are packed.
Recovered by the reconstructions: MLow is a split-band CELP codec
(internally "SMPL"), not the MDCT hybrid an earlier note guessed. The MDCT
DSP cluster in
wa.wasmis the standard-Opus/CELT fallback path, which the decoder routes to separately (a TOC with top bits11is stock Opus; an MLow "smpl" TOC goes to the CELP decoder). See decode-pipeline. - Sample rate and frame size. The binary references 8000 and 16000
internally, while configuration language elsewhere suggests a super-wideband
(32 kHz) path with 20 ms frames. Whether MLow's core runs at 16 kHz with SWB
handled by resampling is unresolved. (
speculative; tracked as a discrepancy.) - ~~Is the entropy coder the Opus/CELT range coder?~~ Answered (verified).
Yes: libopus' CELT range coder (
ec_dec), present unmodified, matched constant-for-constant in #8855-8861 and reproduced + round-trip tested in Go. See decode-pipeline. - Reed-Solomon parameters. Symbol size, block length, and how the RED group maps onto RS shards.
- Encoder internals. Bitrate control (driven by the BWE above), DTX/VAD, and
how
mlow_red_secondary_complexityselects redundancy strength. - Companion NN. Architecture and where it sits in the signal chain (post-filter vs. excitation enhancement). Documented as out-of-scope, but its presence affects how close a companion-free decode can sound.
See also¶
- Decode pipeline: the verified CELT range coder, the SMPL CELP synthesis path, and the standard-Opus-vs-MLow TOC routing.
- Function map: the class/source/function identity table and the corrected names.
- Methodology: every query used to derive this, reproducibly.
- media-srtp: the SRTP/RTP transport this rides on.
- wasm-analysis and warden: the technique and tool.