Skip to content

feat(stt): in-process Parakeet STT backend (Apple Silicon / MLX)#462

Open
nickmeinhold wants to merge 1 commit into
mbailey:masterfrom
nickmeinhold:feat/parakeet-stt-backend
Open

feat(stt): in-process Parakeet STT backend (Apple Silicon / MLX)#462
nickmeinhold wants to merge 1 commit into
mbailey:masterfrom
nickmeinhold:feat/parakeet-stt-backend

Conversation

@nickmeinhold

Copy link
Copy Markdown

What

Adds an in-process Parakeet STT backend for Apple Silicon. Add parakeet://local to VOICEMODE_STT_BASE_URLS and VoiceMode transcribes with NVIDIA's Parakeet ASR (mlx-community/parakeet-tdt-0.6b-v3) running directly via MLX, with no HTTP hop.

Unlike the whisper.cpp / mlx-audio / OpenAI providers (reached over an OpenAI-compatible endpoint), this resolves in-process. It still slots into the existing STT failover chain as just another ordered endpoint, so failover, metrics, and empty-speech detection all keep working unchanged.

This follows the precedent of the kokoro-onnx TTS backend (#261), one provider added behind the existing abstraction.

Why

On Apple Silicon, Parakeet is dramatically faster than the whisper.cpp server for English at equal-or-better accuracy. Measured on an M-series Mac, same 8.78s clip, model resident:

Backend Median latency Speed (RTFx) Transcript
Parakeet (this PR, in-process) 0.19s ~46x perfect
whisper.cpp via mlx-audio server (large-v3-turbo) 1.05s ~8.4x near-perfect

About 3.7x faster through a server, ~5.5x in-process, and it correctly transcribed coined proper nouns ("Imagineering") and "Parakeet" from live mic input in real-world testing.

Design

  • provider_discovery.py: detect parakeet:// and treat it as a local provider.
  • stt_backends/parakeet.py (new): lazy cached model load (weights load once, not per call) + transcription offloaded to a thread via asyncio.to_thread; preserves the audio file's read position; deferred import so non-Mac / missing-dep installs cleanly fall through to the next endpoint.
  • simple_failover.py: dispatch the parakeet provider to the in-process backend, inside the existing try/except so the failover contract holds.
  • config.py: VOICEMODE_PARAKEET_MODEL (default mlx-community/parakeet-tdt-0.6b-v3).
  • pyproject.toml: new parakeet optional extra, arm64-gated so it has no effect on Intel Macs / Linux / Windows.

Config

Recommended (keeps whisper/OpenAI as fallback for the language tail; Parakeet v3 covers English plus 25 European languages, not e.g. Thai):

VOICEMODE_STT_BASE_URLS=parakeet://local,http://127.0.0.1:2022/v1,https://api.openai.com/v1

Apple Silicon only, opt-in. Install with the extra:

uv tool install voice-mode[parakeet]

Tests

  • New tests/test_parakeet_backend.py (5 tests): provider detection, transcription (text + position preservation + load-once caching), failover dispatch, and graceful fall-through when the backend raises (e.g. dep missing). All pass.
  • No regressions in the failover / provider suites (the one pre-existing failure in test_provider_selection needs OPENAI_API_KEY and fails identically on master).

Heads-up (separate from this PR): mlx-audio's STT server is currently broken for Parakeet

While testing, I tried routing Parakeet through mlx_audio.server's /v1/audio/transcriptions (the "unified server" path) and hit two bugs in the current release, which is why this PR runs Parakeet in-process instead:

  1. Parakeet model id is truncated. get_model_category splits the model name on ., so parakeet-tdt-0.6b-v3 becomes module parakeet_tdt_0 (truncated at the .6) and it looks under tts.models -> ModuleNotFoundError: No module named 'mlx_audio.tts.models.parakeet_tdt_0'.
  2. Whisper via the server needs the exact -asr-fp16 variant. mlx-community/whisper-large-v3-turbo loads but throws "Processor not found" at decode; mlx-community/whisper-large-v3-turbo-asr-fp16 works.

Flagging in case it affects the mlx-audio install path (VM-1330). Happy to file upstream on mlx-audio if useful.


Built with Claude Code. Glad to adjust naming, the parakeet:// scheme, or the config surface to match how you'd want it to land.

Add a `parakeet://` STT provider that transcribes in-process via
`parakeet-mlx` (NVIDIA Parakeet on Apple's MLX), with no HTTP hop. Unlike
the whisper.cpp / mlx-audio / OpenAI providers, which are reached over an
OpenAI-compatible endpoint, this runs inference directly in the process.

Select it by adding `parakeet://local` to VOICEMODE_STT_BASE_URLS. It slots
into the existing STT failover chain as just another ordered endpoint, so the
recommended config keeps whisper/OpenAI as fallback for the language tail
(Parakeet v3 covers English + 25 European languages, not e.g. Thai):

  VOICEMODE_STT_BASE_URLS=parakeet://local,http://127.0.0.1:2022/v1,https://api.openai.com/v1

On an M-series Mac this measured ~3.7x faster than the whisper.cpp server
(~0.19s vs ~1.05s on a ~9s clip, in-process) at equal-or-better English
accuracy.

- provider_discovery: detect `parakeet://` and treat it as a local provider
- stt_backends/parakeet.py: lazy cached model load + threaded transcribe,
  preserving the audio file's read position; deferred import so non-Mac /
  missing-dep installs cleanly fall through to the next endpoint
- simple_failover: dispatch `parakeet` provider to the in-process backend,
  inside the existing try/except so failover + empty-detection + metrics hold
- config: VOICEMODE_PARAKEET_MODEL (default mlx-community/parakeet-tdt-0.6b-v3)
- pyproject: `parakeet` optional extra, arm64-gated (no effect off Apple Silicon)
- tests: detection, transcribe (text + position + load-once cache), failover
  dispatch, and graceful fall-through on backend failure

Apple Silicon only, opt-in. Install with: uv tool install voice-mode[parakeet]

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant