Skip to content

breeznik/gemini-live-visual-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Velo — AI Voice Consultant Demo

A voice AI demo showcasing real-time conversation with visual generation. Talk to an AI business consultant that generates visuals while speaking.

Built on Google's Gemini.

Live demo: velo.corepackai.com


Models Used

Purpose Model Can Be Replaced With
Voice (bidirectional audio) gemini-3.1-flash-live-preview Any Gemini Live model
Visual generation (JSON cards) gemini-3.1-flash-lite-preview Any Gemini model that supports JSON output

Both use the same Gemini API key. One key powers the entire experience.


Demo Modes

Mode Business What It Does
🏨 Hotel The Meridian Boutique hotel concierge — rooms, packages, amenities
💻 SaaS Flowdesk Project management sales — plans, features, comparisons
🍣 Restaurant Nori Kitchen Japanese restaurant host — menu, reservations, events
🛠 Dev Under the Hood Explains the architecture and how to build your own

Quick Start

# Backend
cd backend && uv sync && uv run uvicorn src.app:app --reload --port 8000

# Frontend (separate terminal)
cd frontend && npm install && npm run dev

Open http://localhost:5173, enter your Gemini API key, tap the orb.

Key needed:

Deep dive: See docs/ARCHITECTURE.md for the full technical reference.

Architecture

Browser Mic (48kHz float32)
  → AudioWorklet downsample to 16kHz int16 PCM
  → WebSocket
  → FastAPI backend
  → Gemini Live API (bidirectional audio stream)
      ↓
  Gemini calls show_visual(description)
      ↓
  Backend → Gemini 3.1 Flash (JSON generation, ~1-2s)
      ↓
  Returns JSON: {layout: "badges", title: "Pricing", items: [...]}
      ↓
  WebSocket → Frontend
      ↓
  VisualRenderer picks layout → Framer Motion animates card in

Key Pattern: Async Visual Generation

# Gemini calls show_visual → we respond immediately so voice continues
function_responses.append(FunctionResponse(name="show_visual", response={"ok": True}))

# Visual generates in background while Gemini talks
asyncio.ensure_future(_gen_visual(description, websocket, gemini_key))

# ~1-2 seconds later, card appears on screen via WebSocket
await websocket.send_json({"type": "dynamic_visual", "data": card_json})

Key Pattern: 3-Task WebSocket Bridge

await asyncio.gather(
    receive_from_browser(),   # Forward mic audio to Gemini
    receive_from_gemini(),    # Forward audio/events to browser
    keepalive(),              # Ping every 15 seconds
)

Tech Stack

Layer Technology
Voice Gemini Live API (gemini-3.1-flash-live-preview)
Visual Generation Gemini (gemini-3.1-flash-lite-preview)
Backend FastAPI, Python 3.12, uv
Frontend React 18, TypeScript, Framer Motion
Audio AudioWorklet (48→16kHz), Web Audio API (24kHz playback)
Recording Client-side MediaRecorder → MP4/WebM

Visual Card Layouts

11 layout types, picked by the LLM based on content:

Layout Best For Animation
badges Pricing tiers Spring pop
grid Feature overview Stagger fade
timeline Step-by-step process Progressive draw
info Key-value data Slide in
checks Included features Checkmark pop
comparison Side-by-side Slide from sides
stat Big numbers Scale up
contact Action items Stagger fade
funnel Conversion steps Width animate
split Two-panel contrast Slide from sides

Features

  • Single API key — one Gemini key powers voice + visuals
  • BYOK — keys stay in browser, sent directly to Google
  • 4 demo modes — hotel, SaaS, restaurant, dev
  • Per-business themes — accent colors change per mode
  • Dark mode — toggle with persistence
  • Screen recording — client-side MP4, zero backend
  • MeshBlob orb — 3-layer gradient, 6 phase animations
  • Streaming transcript — live text with blinking cursor
  • Security — keys via WebSocket message (not URL), rate limiting, input validation

Stats

Metric Value
Voice latency ~300ms (Gemini Live WebSocket)
Visual generation ~1-2s
Card layouts 11 types + aliases
Audio format 16kHz int16 PCM in, 24kHz int16 PCM out
Knowledge bases 3 businesses + dev mode

Making Your Own

To create a voice demo for your own business:

  1. Add your knowledge in backend/src/voice/knowledge.py
  2. Update the system prompt in backend/src/voice/session.py
  3. Customize the visual prompt in backend/src/voice/visual_agent.py
  4. Change the theme via CSS custom properties in frontend/src/index.css

To swap the visual generation model, change GEMINI_VISUAL_MODEL in visual_agent.py.

Built By

Nikhil Rao — AI engineer specializing in voice agents, real-time systems, and LLM-powered applications.

License

MIT

About

Voice AI demo built on Google's Gemini. Real-time conversation with visual generation. One API key, 4 demo modes, 11 card layouts.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors