feat(stt): in-process Parakeet STT backend (Apple Silicon / MLX) by nickmeinhold · Pull Request #462 · mbailey/voicemode

nickmeinhold · 2026-06-06T08:48:36Z

What

Adds an in-process Parakeet STT backend for Apple Silicon. Add parakeet://local to VOICEMODE_STT_BASE_URLS and VoiceMode transcribes with NVIDIA's Parakeet ASR (mlx-community/parakeet-tdt-0.6b-v3) running directly via MLX, with no HTTP hop.

Unlike the whisper.cpp / mlx-audio / OpenAI providers (reached over an OpenAI-compatible endpoint), this resolves in-process. It still slots into the existing STT failover chain as just another ordered endpoint, so failover, metrics, and empty-speech detection all keep working unchanged.

This follows the precedent of the kokoro-onnx TTS backend (#261), one provider added behind the existing abstraction.

Why

On Apple Silicon, Parakeet is dramatically faster than the whisper.cpp server for English at equal-or-better accuracy. Measured on an M-series Mac, same 8.78s clip, model resident:

Backend	Median latency	Speed (RTFx)	Transcript
Parakeet (this PR, in-process)	0.19s	~46x	perfect
whisper.cpp via mlx-audio server (large-v3-turbo)	1.05s	~8.4x	near-perfect

About 3.7x faster through a server, ~5.5x in-process, and it correctly transcribed coined proper nouns ("Imagineering") and "Parakeet" from live mic input in real-world testing.

Design

provider_discovery.py: detect parakeet:// and treat it as a local provider.
stt_backends/parakeet.py (new): lazy cached model load (weights load once, not per call) + transcription offloaded to a thread via asyncio.to_thread; preserves the audio file's read position; deferred import so non-Mac / missing-dep installs cleanly fall through to the next endpoint.
simple_failover.py: dispatch the parakeet provider to the in-process backend, inside the existing try/except so the failover contract holds.
config.py: VOICEMODE_PARAKEET_MODEL (default mlx-community/parakeet-tdt-0.6b-v3).
pyproject.toml: new parakeet optional extra, arm64-gated so it has no effect on Intel Macs / Linux / Windows.

Config

Recommended (keeps whisper/OpenAI as fallback for the language tail; Parakeet v3 covers English plus 25 European languages, not e.g. Thai):

VOICEMODE_STT_BASE_URLS=parakeet://local,http://127.0.0.1:2022/v1,https://api.openai.com/v1

Apple Silicon only, opt-in. Install with the extra:

uv tool install voice-mode[parakeet]

Tests

New tests/test_parakeet_backend.py (5 tests): provider detection, transcription (text + position preservation + load-once caching), failover dispatch, and graceful fall-through when the backend raises (e.g. dep missing). All pass.
No regressions in the failover / provider suites (the one pre-existing failure in test_provider_selection needs OPENAI_API_KEY and fails identically on master).

Heads-up (separate from this PR): mlx-audio's STT server is currently broken for Parakeet

While testing, I tried routing Parakeet through mlx_audio.server's /v1/audio/transcriptions (the "unified server" path) and hit two bugs in the current release, which is why this PR runs Parakeet in-process instead:

Parakeet model id is truncated. get_model_category splits the model name on ., so parakeet-tdt-0.6b-v3 becomes module parakeet_tdt_0 (truncated at the .6) and it looks under tts.models -> ModuleNotFoundError: No module named 'mlx_audio.tts.models.parakeet_tdt_0'.
Whisper via the server needs the exact -asr-fp16 variant. mlx-community/whisper-large-v3-turbo loads but throws "Processor not found" at decode; mlx-community/whisper-large-v3-turbo-asr-fp16 works.

Flagging in case it affects the mlx-audio install path (VM-1330). Happy to file upstream on mlx-audio if useful.

Built with Claude Code. Glad to adjust naming, the parakeet:// scheme, or the config surface to match how you'd want it to land.

Add a `parakeet://` STT provider that transcribes in-process via `parakeet-mlx` (NVIDIA Parakeet on Apple's MLX), with no HTTP hop. Unlike the whisper.cpp / mlx-audio / OpenAI providers, which are reached over an OpenAI-compatible endpoint, this runs inference directly in the process. Select it by adding `parakeet://local` to VOICEMODE_STT_BASE_URLS. It slots into the existing STT failover chain as just another ordered endpoint, so the recommended config keeps whisper/OpenAI as fallback for the language tail (Parakeet v3 covers English + 25 European languages, not e.g. Thai): VOICEMODE_STT_BASE_URLS=parakeet://local,http://127.0.0.1:2022/v1,https://api.openai.com/v1 On an M-series Mac this measured ~3.7x faster than the whisper.cpp server (~0.19s vs ~1.05s on a ~9s clip, in-process) at equal-or-better English accuracy. - provider_discovery: detect `parakeet://` and treat it as a local provider - stt_backends/parakeet.py: lazy cached model load + threaded transcribe, preserving the audio file's read position; deferred import so non-Mac / missing-dep installs cleanly fall through to the next endpoint - simple_failover: dispatch `parakeet` provider to the in-process backend, inside the existing try/except so failover + empty-detection + metrics hold - config: VOICEMODE_PARAKEET_MODEL (default mlx-community/parakeet-tdt-0.6b-v3) - pyproject: `parakeet` optional extra, arm64-gated (no effect off Apple Silicon) - tests: detection, transcribe (text + position + load-once cache), failover dispatch, and graceful fall-through on backend failure Apple Silicon only, opt-in. Install with: uv tool install voice-mode[parakeet] Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stt): in-process Parakeet STT backend (Apple Silicon / MLX)#462

feat(stt): in-process Parakeet STT backend (Apple Silicon / MLX)#462
nickmeinhold wants to merge 1 commit into
mbailey:masterfrom
nickmeinhold:feat/parakeet-stt-backend

nickmeinhold commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nickmeinhold commented Jun 6, 2026

What

Why

Design

Config

Tests

Heads-up (separate from this PR): mlx-audio's STT server is currently broken for Parakeet

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant