GitHub - b-smyers/voice-agent-framework: A flexible, modular Python framework to build voice-activated assistants by mixing and matching any STT (Speech-to-Text), TTS (Text-to-Speech), and LLM (Large Language Model) backends, whether it be open source or proprietary.

Voice Agent Framework

A flexible, modular Python framework to build voice-activated assistants by mixing and matching any STT (Speech-to-Text), TTS (Text-to-Speech), and LLM (Large Language Model) backends, whether it be open source or proprietary.

Features

✔️ Modular backend swapping — Easily swap STT, TTS, and LLM providers without changing core logic.
✔️ Supports local and cloud providers — Mix offline open-source providers with cloud APIs in a single pipeline.
✔️ Wake word activation — Trigger recording hands-free using configurable wake words via Porcupine.
✔️ Start/stop recording tones — Audible tones signal when recording starts and stops, with automatic silence detection.

Providers

A variety of STT, TTS, and LLM providers are supported out of the box, allowing you to experiment with both local and cloud-based models to match different use cases.

Supported provider types:

🗣️ Speech-to-Text (STT): Whisper, Silero
💬 Large Language Model (LLM): Gemini, ChatGPT
🔊 Text-to-Speech (TTS): ElevenLabs, Silero, Piper, Gemini

For detailed information on each provider — including features, usage notes, and recommendations — check the full provider reference.

Usage

Clone the repo

git clone git@github.com:b-smyers/voice-agent-framework.git
cd voice-agent-framework

Create Python 3.10 environment

python3.10 -m venv venv/
source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Configure a custom Agent in main.py by swapping out different providers (optional)
- Check out the Provider Reference!
Set environment variables in .env
- cp .env-sample .env
- The defualt Agent configuration requires a Picovoice and Gemini API key, edit .env with your API keys. This guide assumes you have access to both.
Run the application

python main.py

Presto! After successful setup, you can now use your Assistant:
- Say the wake word: "ok agent".
- Wait for the start tone, which means the microphone is listening.
- Ask your question naturally.
- When you stop speaking, the system will detect silence and play a stop tone.
- After processing your request, the Assistant will reply using text-to-speech.

Note

The recording stays open while you're speaking. If there’s background noise or if you don't pause clearly, it may stay active longer than expected.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
llm		llm
sounds		sounds
stt		stt
tts		tts
utils		utils
wake-words		wake-words
.env-sample		.env-sample
.gitignore		.gitignore
PROVIDERS.md		PROVIDERS.md
README.md		README.md
agent.py		agent.py
detect_wake.py		detect_wake.py
internal_prompt.txt		internal_prompt.txt
main.py		main.py
record.py		record.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Agent Framework

Features

Providers

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Agent Framework

Features

Providers

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages