Skip to content

libDF: enable wasm_simd on rustfft and realfft#687

Open
czoli1976 wants to merge 1 commit into
Rikorose:mainfrom
czoli1976:feature/realfft-wasm-simd
Open

libDF: enable wasm_simd on rustfft and realfft#687
czoli1976 wants to merge 1 commit into
Rikorose:mainfrom
czoli1976:feature/realfft-wasm-simd

Conversation

@czoli1976
Copy link
Copy Markdown

Summary

Both rustfft and realfft ship comprehensive WASM SIMD paths gated behind a non-default wasm_simd feature flag. Without it, FFTs fall back to scalar on wasm32 even when the build has +simd128 enabled. For libDF's STFT path (which goes through realfft, with rustfft as a transitive dep with default-features = false), both crates' flags must be enabled for the SIMD code to actually flow through to runtime.

Changes

libDF/Cargo.toml:

  • rustfft: keep ^6.1.0 constraint (the lockfile-resolved 6.2.0 already has the wasm_simd feature), add features = ["wasm_simd"].
  • realfft: bump constraint from ^3.1.0 to ^3.5 (the wasm_simd feature was first added in 3.5.0), add features = ["wasm_simd"].

Cargo.lock: realfft 3.3.0 → 3.5.0, with workspace-wide cascade. The lockfile diff is large (~80 crate version changes) because libDF's resolution hasn't been refreshed in a while; touching realfft triggers re-resolution of dependents. If the noise is unwanted, reverting just the lockfile and leaving downstream cargo update to handle it is safe — the libDF/Cargo.toml change is what matters for behavior.

Impact

On wasm32, the WASM SIMD path produces ~2.4× faster FFTs at DFN3-relevant sizes. Isolated bench (wasmtime / Cranelift, M-class Mac) on FFT round-trips:

FFT size scalar with wasm_simd Δ
n=480 (DFN3 STFT) 3042 ns 1280 ns -58%
n=512 (pow2) 2601 ns 1146 ns -56%
n=1024 (pow2) 5906 ns 2482 ns -58%
n=2048 12051 ns 5324 ns -56%
n=4096 26149 ns 11270 ns -57%

End-to-end DFN3 RTF impact is small (within noise on wasmtime, ~1.9% on Node 22 / V8) because FFT is roughly 1% of DFN3's per-hop budget at 48 kHz with hop=480. The gain is much larger for FFT-bound workloads (Whisper, vocoders, music processing, pure DSP code).

The flag is a no-op on non-wasm targets — native ARM (NEON) and x86 (AVX/SSE) paths are unchanged.

Testing

RUSTFLAGS='-C target-feature=+simd128' cargo build --release --target=wasm32-wasip1

succeeds with the new constraint. Standalone DFN3 RTF bench against vendored libDF (30s synthetic test signal, 3 timed passes per build, min-of-3, hop_size=480, sr=48000):

Runtime Baseline RTF With wasm_simd Δ
wasmtime / Cranelift 0.0533 0.0529 -0.7% (within noise)
Node 22 / V8 0.0520 0.0510 -1.9%

Companion PR

Tract has the same gap at the workspace level: sonos/tract#2205. The two PRs are independent — both should land for the full SIMD path to flow on wasm32 builds that use libDF.

🤖 Generated with Claude Code

Both `rustfft` and `realfft` ship comprehensive WASM SIMD paths gated
behind a non-default `wasm_simd` feature flag. Without it, FFTs fall
back to scalar on wasm32 even when the build has `+simd128` enabled.
For libDF's STFT path (which goes through realfft), both crates' flags
must be enabled for the SIMD code to actually execute.

## Changes

`libDF/Cargo.toml`:
- `rustfft`: keep `^6.1.0` constraint (lockfile-resolved 6.2.0 already has the
  feature), add `features = ["wasm_simd"]`.
- `realfft`: bump constraint from `^3.1.0` to `^3.5` (the `wasm_simd` feature
  was first added in 3.5.0), add `features = ["wasm_simd"]`.

`Cargo.lock`: realfft 3.3.0 → 3.5.0, with workspace-wide cascade. The
lockfile churn is large because libDF's resolution hasn't been refreshed
in a while; touching realfft triggers re-resolution of dependents. If
the noise is unwanted, reverting the lockfile and leaving downstream
`cargo update` to handle it is safe — the Cargo.toml change is what
matters.

## Impact

On wasm32, the WASM SIMD path produces ~2.4× faster FFTs at DFN3-relevant
sizes. Isolated bench (wasmtime/Cranelift, M-class Mac) on FFT round-trips:

| FFT size            | scalar  | with wasm_simd |  Δ   |
|---------------------|--------:|---------------:|-----:|
| n=480 (DFN3 STFT)   | 3042 ns |        1280 ns | -58% |
| n=1024 (pow2)       | 5906 ns |        2482 ns | -58% |
| n=2048              | 12051 ns|        5324 ns | -56% |

End-to-end DFN3 RTF impact is small (~1-2% on V8, within noise on
Cranelift) because FFT is roughly 1% of DFN3's per-hop budget. The
gain is much larger for FFT-bound workloads (Whisper, vocoders, music
processing, pure DSP code).

The flag is a no-op on non-wasm targets — native ARM (NEON) and x86
(AVX/SSE) paths are unchanged.

## Testing

`cargo build --release --target=wasm32-wasip1 --features=wasm` succeeds
with the new constraint and produces a binary that emits WASM SIMD
ops in the FFT codepath (verified via `wasm-objdump -d`).

## Companion PR

Tract has the same gap; companion PR at sonos/tract: enabling
rustfft's `wasm_simd` feature at the workspace level for any tract
WASM consumer that uses FFT through `tract-onnx-opl` directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant