A voice-powered AI assistant that answers questions from a knowledge base. Speak your question, get a spoken answer.
🎤 You speak
│
▼
┌─────────────┐
│ Whisper │ ← Speech-to-Text (local)
└──────┬──────┘
│
▼
┌─────────────┐
│ ChromaDB │ ← Vector search for relevant docs
└──────┬──────┘
│
▼
┌─────────────┐
│ Claude │ ← Generate conversational answer
└──────┬──────┘
│
▼
┌─────────────┐
│ ElevenLabs │ ← Text-to-Speech
└──────┬──────┘
│
▼
🔊 Assistant speaks
The assistant comes pre-loaded with sample hotel information:
- Check-in/check-out times
- Pool location and hours
- Room service
- WiFi information
- Fitness center
Ask questions like:
- "What time is check-in?"
- "Where is the swimming pool?"
- "How do I order room service?"
| Component | Technology | Notes |
|---|---|---|
| Speech-to-Text | Whisper (local) | Runs on CPU, no API needed |
| Vector Database | ChromaDB | Local, persistent storage |
| Embeddings | sentence-transformers | all-MiniLM-L6-v2 |
| LLM | Claude | Conversational answers |
| Text-to-Speech | ElevenLabs | Natural-sounding voices |
| UI | Gradio | Web interface with audio |
# Clone
git clone https://github.com/woodstocksoftware/voice-rag-assistant.git
cd voice-rag-assistant
# Create environment
python3.12 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Install ffmpeg (required for Whisper)
brew install ffmpeg # macOS
# sudo apt install ffmpeg # Ubuntu
# Set API keys
export ANTHROPIC_API_KEY="your-key"
export ELEVENLABS_API_KEY="your-key"
# Run
python app.pyOpen http://localhost:7860 and start talking!
voice-rag-assistant/
├── app.py # Gradio UI
├── src/
│ ├── voice/
│ │ └── speech_to_text.py # Whisper transcription
│ ├── rag/
│ │ └── knowledge_base.py # ChromaDB + Claude
│ └── tts/
│ └── text_to_speech.py # ElevenLabs TTS
├── data/
│ └── chroma/ # Vector database storage
└── requirements.txt
Use the "Add Documents" tab in the UI, or programmatically:
from src.rag.knowledge_base import KnowledgeBase
kb = KnowledgeBase()
kb.add_documents([
"Your custom information here.",
"More information about your domain.",
])ElevenLabs offers many voices. Change in the Settings tab or:
from src.tts.text_to_speech import TextToSpeech
tts = TextToSpeech(voice="Drew") # Male voice
tts = TextToSpeech(voice="Sarah") # Different female voice- Hotel concierge answering guest questions
- Restaurant information and reservations
- Voice tutoring assistant
- Interactive learning companion
- Accessibility for visually impaired students
- Internal knowledge base with voice interface
- Hands-free documentation lookup
| Service | Cost |
|---|---|
| Whisper | Free (runs locally) |
| ChromaDB | Free (runs locally) |
| Claude | ~$0.01-0.03 per query |
| ElevenLabs | Free tier: 10,000 chars/month |
MIT
Built by Jim Williams | GitHub