What
Would it be possible to publish a quantized release artifact (INT8 dynamic, or Q4_0/Q8_0-style) of the DeepFilterNet3 model that drops directly into the existing WASM df_create(modelBytes, atten_lim) path?
Currently the released deepfilter3_model.tar.gz is FP32, ~7.7 MB. On 3g connections the cold load (8.6 MB df_bg.wasm + 7.7 MB model) is ~95 s before processing starts. A quantized model (~2-4 MB) would cut roughly half the cold-load wait, with a small DNSMOS drop.
Why this matters
We're a free, browser-only audio toolset (https://timbrica.com/en/denoise) that adopted DFN3 via the WASM path you ship. Telemetry across ~1700 unique sessions in 3 days post-launch shows the heaviest perception cost is the initial download — not the inference itself, which is already snappy. A quantized variant in the existing WASM runtime would preserve everything (single asset, df_create() API, no DSP rewrite) while dramatically improving the cold-start cohort.
What we considered
- Self-quantize and repackage the existing tar.gz — the format includes runtime constants beyond just weights and we couldn't confirm the WASM runtime accepts custom-quantized payloads.
- Migrate to onnxruntime-web with our own quantized ONNX — works in principle but requires reimplementing the full STFT(960, Vorbis) + ERB filter bank + iSTFT + mask combination logic in JS (the DSP currently lives inside
df_bg.wasm). That's ~3-5 days of focused work + risk of quality regression vs. your reference.
- Use the existing FP32 — what we ship today; the cold-start cost is the open issue.
What would be ideal
A deepfilter3_model_q8.tar.gz (or similar) loadable by the same df_create() entry point. Even a 4-5 MB variant would help substantially.
If quantizing while keeping the WASM runtime compatibility is impractical, would you accept a PR that exports a quantized ONNX-bundle suitable for onnxruntime-web (matching the inference logic of df_bg.wasm)?
Happy to share telemetry, contribute a PR, or help test. Thanks for DFN3 — it's the best browser-deployable speech denoiser by a margin.
— Farid (Timbrica)
What
Would it be possible to publish a quantized release artifact (INT8 dynamic, or
Q4_0/Q8_0-style) of the DeepFilterNet3 model that drops directly into the existing WASMdf_create(modelBytes, atten_lim)path?Currently the released
deepfilter3_model.tar.gzis FP32, ~7.7 MB. On 3g connections the cold load (8.6 MBdf_bg.wasm+ 7.7 MB model) is ~95 s before processing starts. A quantized model (~2-4 MB) would cut roughly half the cold-load wait, with a small DNSMOS drop.Why this matters
We're a free, browser-only audio toolset (https://timbrica.com/en/denoise) that adopted DFN3 via the WASM path you ship. Telemetry across ~1700 unique sessions in 3 days post-launch shows the heaviest perception cost is the initial download — not the inference itself, which is already snappy. A quantized variant in the existing WASM runtime would preserve everything (single asset,
df_create()API, no DSP rewrite) while dramatically improving the cold-start cohort.What we considered
df_bg.wasm). That's ~3-5 days of focused work + risk of quality regression vs. your reference.What would be ideal
A
deepfilter3_model_q8.tar.gz(or similar) loadable by the samedf_create()entry point. Even a 4-5 MB variant would help substantially.If quantizing while keeping the WASM runtime compatibility is impractical, would you accept a PR that exports a quantized ONNX-bundle suitable for onnxruntime-web (matching the inference logic of
df_bg.wasm)?Happy to share telemetry, contribute a PR, or help test. Thanks for DFN3 — it's the best browser-deployable speech denoiser by a margin.
— Farid (Timbrica)