Velo — AI Voice Consultant Demo

A voice AI demo showcasing real-time conversation with visual generation. Talk to an AI business consultant that generates visuals while speaking.

Built on Google's Gemini.

Live demo: velo.corepackai.com

Models Used

Purpose	Model	Can Be Replaced With
Voice (bidirectional audio)	`gemini-3.1-flash-live-preview`	Any Gemini Live model
Visual generation (JSON cards)	`gemini-3.1-flash-lite-preview`	Any Gemini model that supports JSON output

Both use the same Gemini API key. One key powers the entire experience.

Demo Modes

Mode	Business	What It Does
🏨 Hotel	The Meridian	Boutique hotel concierge — rooms, packages, amenities
💻 SaaS	Flowdesk	Project management sales — plans, features, comparisons
🍣 Restaurant	Nori Kitchen	Japanese restaurant host — menu, reservations, events
🛠 Dev	Under the Hood	Explains the architecture and how to build your own

Quick Start

# Backend
cd backend && uv sync && uv run uvicorn src.app:app --reload --port 8000

# Frontend (separate terminal)
cd frontend && npm install && npm run dev

Open http://localhost:5173, enter your Gemini API key, tap the orb.

Key needed:

Gemini API key (free tier available)

Deep dive: See docs/ARCHITECTURE.md for the full technical reference.

Architecture

Browser Mic (48kHz float32)
  → AudioWorklet downsample to 16kHz int16 PCM
  → WebSocket
  → FastAPI backend
  → Gemini Live API (bidirectional audio stream)
      ↓
  Gemini calls show_visual(description)
      ↓
  Backend → Gemini 3.1 Flash (JSON generation, ~1-2s)
      ↓
  Returns JSON: {layout: "badges", title: "Pricing", items: [...]}
      ↓
  WebSocket → Frontend
      ↓
  VisualRenderer picks layout → Framer Motion animates card in

Key Pattern: Async Visual Generation

# Gemini calls show_visual → we respond immediately so voice continues
function_responses.append(FunctionResponse(name="show_visual", response={"ok": True}))

# Visual generates in background while Gemini talks
asyncio.ensure_future(_gen_visual(description, websocket, gemini_key))

# ~1-2 seconds later, card appears on screen via WebSocket
await websocket.send_json({"type": "dynamic_visual", "data": card_json})

Key Pattern: 3-Task WebSocket Bridge

await asyncio.gather(
    receive_from_browser(),   # Forward mic audio to Gemini
    receive_from_gemini(),    # Forward audio/events to browser
    keepalive(),              # Ping every 15 seconds
)

Tech Stack

Layer	Technology
Voice	Gemini Live API (`gemini-3.1-flash-live-preview`)
Visual Generation	Gemini (`gemini-3.1-flash-lite-preview`)
Backend	FastAPI, Python 3.12, uv
Frontend	React 18, TypeScript, Framer Motion
Audio	AudioWorklet (48→16kHz), Web Audio API (24kHz playback)
Recording	Client-side MediaRecorder → MP4/WebM

Visual Card Layouts

11 layout types, picked by the LLM based on content:

Layout	Best For	Animation
`badges`	Pricing tiers	Spring pop
`grid`	Feature overview	Stagger fade
`timeline`	Step-by-step process	Progressive draw
`info`	Key-value data	Slide in
`checks`	Included features	Checkmark pop
`comparison`	Side-by-side	Slide from sides
`stat`	Big numbers	Scale up
`contact`	Action items	Stagger fade
`funnel`	Conversion steps	Width animate
`split`	Two-panel contrast	Slide from sides

Features

Single API key — one Gemini key powers voice + visuals
BYOK — keys stay in browser, sent directly to Google
4 demo modes — hotel, SaaS, restaurant, dev
Per-business themes — accent colors change per mode
Dark mode — toggle with persistence
Screen recording — client-side MP4, zero backend
MeshBlob orb — 3-layer gradient, 6 phase animations
Streaming transcript — live text with blinking cursor
Security — keys via WebSocket message (not URL), rate limiting, input validation

Stats

Metric	Value
Voice latency	~300ms (Gemini Live WebSocket)
Visual generation	~1-2s
Card layouts	11 types + aliases
Audio format	16kHz int16 PCM in, 24kHz int16 PCM out
Knowledge bases	3 businesses + dev mode

Making Your Own

To create a voice demo for your own business:

Add your knowledge in backend/src/voice/knowledge.py
Update the system prompt in backend/src/voice/session.py
Customize the visual prompt in backend/src/voice/visual_agent.py
Change the theme via CSS custom properties in frontend/src/index.css

To swap the visual generation model, change GEMINI_VISUAL_MODEL in visual_agent.py.

Built By

Nikhil Rao — AI engineer specializing in voice agents, real-time systems, and LLM-powered applications.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
backend		backend
docs		docs
frontend		frontend
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Velo — AI Voice Consultant Demo

Models Used

Demo Modes

Quick Start

Architecture

Key Pattern: Async Visual Generation

Key Pattern: 3-Task WebSocket Bridge

Tech Stack

Visual Card Layouts

Features

Stats

Making Your Own

Built By

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Velo — AI Voice Consultant Demo

Models Used

Demo Modes

Quick Start

Architecture

Key Pattern: Async Visual Generation

Key Pattern: 3-Task WebSocket Bridge

Tech Stack

Visual Card Layouts

Features

Stats

Making Your Own

Built By

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages