A real-time voice chat application that combines speech-to-text transcription, large language model conversations, and text-to-speech synthesis. Built with Rust backend and WebAssembly frontend for high performance and browser compatibility.
- π Privacy-First - Local transcription, and text generation, no cloud dependencies
- π€ Voice Recording - Browser-based audio recording with WebRTC
- π Speech-to-Text - Local Whisper model transcription using Candle ML
- π€ AI Conversations - Integration with Ollama for various LLM models
- π Text-to-Speech - eSpeak synthesis with audio playback
- π Web Interface - Modern responsive UI with real-time interactions
- β‘ High Performance - Rust backend with WebAssembly frontend
- Rust (latest stable) - Install from rustup.rs
- FFmpeg - Required for audio format conversion
- eSpeak/eSpeak-NG - For text-to-speech synthesis
- Ollama - For LLM inference
- wasm-pack - For building WebAssembly frontend
sudo apt update
sudo apt install ffmpeg espeak-ng ollama
cargo install wasm-packsudo pacman -S ffmpeg espeak ollama
cargo install wasm-packbrew install ffmpeg espeak ollama
cargo install wasm-pack# Start Ollama service
ollama serve
# Pull default model (in another terminal)
ollama pull gemma3:12bgit clone https://github.com/danielclough/voice-chat-whisper-ollama-espeak-wasm
cd chatter
# Build frontend WebAssembly module
./build_frontend.sh
# Build and run backend
cargo check && cargo build
cargo runNavigate to http://127.0.0.1:3000 in your browser.
- Enable Microphone - Toggle the mic switch to enable voice recording
- Record Audio - Hold the red microphone button to record your voice
- Release to Send - Audio is automatically transcribed and sent to the AI
- Listen to Response - AI response is synthesized to speech and played back
- Type Message - Enter text in the input field
- Select Model - Choose from available Ollama models in the dropdown
- Send - Click Send or press Enter to get AI response
- Audio Playback - Response is automatically converted to speech
- Web Server: Axum with CORS support and static file serving
- Audio Processing: Hound for WAV handling, FFmpeg integration
- ML Inference: Candle for local Whisper transcription
- LLM Integration: HTTP client for Ollama API communication
- TTS: eSpeak system integration with fallback support
- WASM Module: Rust compiled to WebAssembly for browser execution
- Audio APIs: MediaRecorder for voice capture, AudioContext for playback
- UI: Modern responsive interface with real-time status updates
- Networking: Fetch API for backend communication
Browser β Record Audio (WebM) β Upload to /api/transcribe β
FFmpeg Conversion β Whisper Transcription β /api/chat β
Ollama LLM β Response β eSpeak TTS β Audio File β Browser Playback
Edit config.yaml to customize application behavior:
server:
host: "127.0.0.1"
port: 3000
cors_enabled: true
message_max_length: 10000chat:
default_model: "gemma3:12b" # Default Ollama model
api_endpoint: "http://localhost:11434" # Ollama server URL
timeout_seconds: 30transcription:
model_repo: "openai/whisper-tiny" # Whisper model size
device: "auto" # auto, cpu, cuda
language: "auto" # auto or language code
quantized: true # Use quantized model for faster inferenceaudio:
max_file_size_mb: 100
supported_formats: ["wav", "mp3", "m4a", "flac"]
sample_rate: 16000
pitch: "60" # eSpeak pitch setting
amplitude: "100" # eSpeak volume settingSend text message to LLM and get response with audio.
Request:
{
"message": "Hello, how are you?",
"model": "gemma3:12b" // optional
}Response:
{
"response": "I'm doing well, thank you for asking!",
"audio_url": "/api/audio/uuid.wav"
}Upload audio file for speech-to-text transcription.
Request: Multipart form with audio field
Response:
{
"text": "Transcribed speech content"
}Serve generated audio files.
Response: WAV audio file with appropriate headers
# Development build
cd frontend
wasm-pack build --target web --out-dir ../static/pkg --dev
# Or use the build script
./build_frontend.sh# Backend with hot reload (cargo-watch)
cargo install cargo-watch
cargo watch -x "check && run"
# Frontend rebuild on changes
cd frontend
wasm-pack build --target web --out-dir ../static/pkg --dev --watchCommon models you can install:
gemma3:12b(default) - Google's Gemma 3 modelllama3.2- Meta's Llama 3.2llama3.1- Meta's Llama 3.1codellama- Code-specialized Llamamistral- Mistral AI modelphi3- Microsoft's Phi-3
ollama pull <model-name>Error: FFmpeg is not available
# Install FFmpeg
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg
# Verify: ffmpeg -versionError: Browser blocks microphone access
- Ensure HTTPS or localhost access
- Check browser permissions in settings
- Try different browser if issues persist
Error: Failed to connect to Ollama
# Check Ollama is running
ollama serve
# Verify model is installed
ollama list
# Test API manually
curl http://localhost:11434/api/generate -d '{"model":"gemma3:12b","prompt":"test"}'Error: wasm-pack not found
cargo install wasm-packError: eSpeak not found
- Install eSpeak:
brew install espeakorsudo apt install espeak-ng - Application will fall back to system TTS if eSpeak unavailable
For GPU acceleration:
# Ensure CUDA is installed and compatible
# Rebuild with CUDA features enabled
cargo build --features cudaThis project is dual-licensed under:
