Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ dist/
*.bin
out/
*.wav
!docs/demo.wav
.DS_Store
.env
.coverage
Expand Down
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@

Offline TTS using [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) via [kokoro-onnx](https://github.com/thewh1teagle/kokoro-onnx). Apache 2.0 model, ~340MB, CPU real-time, plays straight to system audio. Designed to be importable as a Python library, drivable as a CLI, or poked via a unix socket for ~13ms speech requests from shell scripts.

🔊 **Hear it:** [docs/demo.wav](./docs/demo.wav) — five seconds of two voices speaking the tagline (`af_heart` then `bf_emma`).

## Install

From PyPI — recommended for most users:
Expand Down Expand Up @@ -178,6 +180,23 @@ Model weights (`kokoro-v1.0.onnx`, ~340 MB) and voices are downloaded from the [

Security issues themselves should not be filed as public GitHub issues — see [`SECURITY.md`](./SECURITY.md) for the disclosure process.

## How does this compare to other TTS?

stackvox is a fairly opinionated narrow slice of the TTS space. Here's where it sits next to the obvious neighbours:

| Tool | Offline? | Quality | Latency (typical) | License | Best for |
|---|---|---|---|---|---|
| **stackvox** (Kokoro-82M) | ✅ | High (24kHz, 50+ voices, 9 languages) | ~300ms in-process · ~13ms via daemon helper | Apache 2.0 | Local apps, shell hooks, anything that wants natural voice without the cloud |
| macOS `say` | ✅ | OK | ~50ms | macOS only | macOS-only scripts, "good enough" voice |
| `espeak-ng` | ✅ | Robotic | ~10ms | GPL-3.0 | Accessibility, screen readers, embedded |
| [Piper](https://github.com/rhasspy/piper) | ✅ | High | ~100ms | MIT | Similar use-case to stackvox; ONNX-based, more voices in some languages |
| [Coqui TTS](https://github.com/coqui-ai/TTS) | ✅ | Very high (research models) | seconds | MPL-2.0 | Research, fine-tuning, voice cloning |
| OpenAI / ElevenLabs / etc. | ❌ | Highest | network-bound | Proprietary | Production apps that can pay per-call and accept network dependency |

Where stackvox tries to be different from Piper specifically: a **resident daemon + bash helper** path that gets you sub-15ms speech requests from shell scripts (CI hooks, terminal notifications, status announcements) without paying Python's startup cost on every call. That's basically the point — voice quality alone wouldn't be enough to switch off Piper, but the IPC story makes a difference for shell-driven workflows.

Pick stackvox if you want **good voices, fully offline, with a fast shell-friendly API**.

## License & attributions

stackvox itself is licensed under the **Apache License, Version 2.0** — see [`LICENSE`](./LICENSE). Third-party attributions are collected in [`NOTICE`](./NOTICE); the summary below is informational.
Expand Down
Binary file added docs/demo.wav
Binary file not shown.