A voice AI demo showcasing real-time conversation with visual generation. Talk to an AI business consultant that generates visuals while speaking.
Built on Google's Gemini.
Live demo: velo.corepackai.com
| Purpose | Model | Can Be Replaced With |
|---|---|---|
| Voice (bidirectional audio) | gemini-3.1-flash-live-preview |
Any Gemini Live model |
| Visual generation (JSON cards) | gemini-3.1-flash-lite-preview |
Any Gemini model that supports JSON output |
Both use the same Gemini API key. One key powers the entire experience.
| Mode | Business | What It Does |
|---|---|---|
| 🏨 Hotel | The Meridian | Boutique hotel concierge — rooms, packages, amenities |
| 💻 SaaS | Flowdesk | Project management sales — plans, features, comparisons |
| 🍣 Restaurant | Nori Kitchen | Japanese restaurant host — menu, reservations, events |
| 🛠 Dev | Under the Hood | Explains the architecture and how to build your own |
# Backend
cd backend && uv sync && uv run uvicorn src.app:app --reload --port 8000
# Frontend (separate terminal)
cd frontend && npm install && npm run devOpen http://localhost:5173, enter your Gemini API key, tap the orb.
Key needed:
- Gemini API key (free tier available)
Deep dive: See
docs/ARCHITECTURE.mdfor the full technical reference.
Browser Mic (48kHz float32)
→ AudioWorklet downsample to 16kHz int16 PCM
→ WebSocket
→ FastAPI backend
→ Gemini Live API (bidirectional audio stream)
↓
Gemini calls show_visual(description)
↓
Backend → Gemini 3.1 Flash (JSON generation, ~1-2s)
↓
Returns JSON: {layout: "badges", title: "Pricing", items: [...]}
↓
WebSocket → Frontend
↓
VisualRenderer picks layout → Framer Motion animates card in
# Gemini calls show_visual → we respond immediately so voice continues
function_responses.append(FunctionResponse(name="show_visual", response={"ok": True}))
# Visual generates in background while Gemini talks
asyncio.ensure_future(_gen_visual(description, websocket, gemini_key))
# ~1-2 seconds later, card appears on screen via WebSocket
await websocket.send_json({"type": "dynamic_visual", "data": card_json})await asyncio.gather(
receive_from_browser(), # Forward mic audio to Gemini
receive_from_gemini(), # Forward audio/events to browser
keepalive(), # Ping every 15 seconds
)| Layer | Technology |
|---|---|
| Voice | Gemini Live API (gemini-3.1-flash-live-preview) |
| Visual Generation | Gemini (gemini-3.1-flash-lite-preview) |
| Backend | FastAPI, Python 3.12, uv |
| Frontend | React 18, TypeScript, Framer Motion |
| Audio | AudioWorklet (48→16kHz), Web Audio API (24kHz playback) |
| Recording | Client-side MediaRecorder → MP4/WebM |
11 layout types, picked by the LLM based on content:
| Layout | Best For | Animation |
|---|---|---|
badges |
Pricing tiers | Spring pop |
grid |
Feature overview | Stagger fade |
timeline |
Step-by-step process | Progressive draw |
info |
Key-value data | Slide in |
checks |
Included features | Checkmark pop |
comparison |
Side-by-side | Slide from sides |
stat |
Big numbers | Scale up |
contact |
Action items | Stagger fade |
funnel |
Conversion steps | Width animate |
split |
Two-panel contrast | Slide from sides |
- Single API key — one Gemini key powers voice + visuals
- BYOK — keys stay in browser, sent directly to Google
- 4 demo modes — hotel, SaaS, restaurant, dev
- Per-business themes — accent colors change per mode
- Dark mode — toggle with persistence
- Screen recording — client-side MP4, zero backend
- MeshBlob orb — 3-layer gradient, 6 phase animations
- Streaming transcript — live text with blinking cursor
- Security — keys via WebSocket message (not URL), rate limiting, input validation
| Metric | Value |
|---|---|
| Voice latency | ~300ms (Gemini Live WebSocket) |
| Visual generation | ~1-2s |
| Card layouts | 11 types + aliases |
| Audio format | 16kHz int16 PCM in, 24kHz int16 PCM out |
| Knowledge bases | 3 businesses + dev mode |
To create a voice demo for your own business:
- Add your knowledge in
backend/src/voice/knowledge.py - Update the system prompt in
backend/src/voice/session.py - Customize the visual prompt in
backend/src/voice/visual_agent.py - Change the theme via CSS custom properties in
frontend/src/index.css
To swap the visual generation model, change GEMINI_VISUAL_MODEL in visual_agent.py.
Nikhil Rao — AI engineer specializing in voice agents, real-time systems, and LLM-powered applications.
MIT