You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add streaming Silero VAD runner for real-time speech detection
Add a new `silero_vad_stream_runner` CLI that reads 16kHz mono
float32 PCM from stdin and outputs per-frame speech probabilities
via a simple line protocol (`PROB <time> <probability>`). This
enables real-time VAD as a subprocess for apps like the Voxtral
Realtime macOS dictation app.
Changes:
- Add `reset_stream()` and `process_frame()` to SileroVadRunner
for stateful frame-by-frame inference with persistent LSTM state
- Add `stream_main.cpp` as the streaming CLI entry point
- Update CMakeLists.txt to build both `silero_vad_runner` (offline)
and `silero_vad_stream_runner` (streaming) targets
- Remove unnecessary `extension_llm_runner` dependency that caused
build conflicts with sentencepiece headers
- Update Makefile `silero-vad-cpu` target to build both runners
with `-DEXECUTORCH_BUILD_EXTENSION_LLM_RUNNER=OFF`
- Update README with streaming usage and architecture docs
Authored with assistance from Claude.
Made-with: Cursor
The model processes audio in 512-sample chunks (32ms at 16kHz). Each chunk is prepended with 64 samples of context from the previous chunk, forming a 576-sample input. The model carries an LSTM hidden state across chunks and outputs a single speech probability per chunk.
0 commit comments