Skip to content

DevanshMistry890/hotel-voice-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hotel Voice RAG Agent 🎙️

Real-Time AI Orchestrator with Local RAG, Tool Use, and CRM Sync

License: MIT Stack Model Latency

🔴 Watch Demo Video (YouTube) > See the agent handle booking logic, context switching, and CRM logging in real-time.

✨ Experience the High-Fidelity Version > Demo utilizing ElevenLabs Agents for ultra-realistic voice synthesis.


📸 UI Preview

Application Dashboard The "Living Glass" UI reacts to voice states in real-time: Listening (Green), Processing (Purple), and Saving (Amber).


📖 Overview

Hotel Voice RAG Agent is an enterprise-grade voice orchestrator(conversational AI) designed to reduce operational overhead by delivering 24/7 Tier-1 support, seamlessly blending generative conversation with deterministic business logic for secure room reservations.

Unlike standard chatbots, this system acts as an Intelligent Orchestrator. It dynamically routes user intent between deterministic Tools (Availability Calendar), probabilistic Generative AI (Gemini 2.5 Flash), and a Local Knowledge Base (RAG via ChromaDB).

The architecture prioritizes speed and reliability, achieving sub-second voice-to-voice latency by leveraging gemini-2.5-flash-lite for inference and EdgeTTS for synthesis. It features an Event-Driven CRM Pipeline that asynchronously logs call outcomes to Google Sheets without blocking the voice thread.

🚀 Key Capabilities

  • Hybrid Orchestration: Seamlessly switches context between general chat, specific policy retrieval (RAG), and logic-based booking tools.
  • Local RAG Implementation: Uses ChromaDB (persistent on-disk) to retrieve "long-tail" knowledge (e.g., accessibility policies) only when triggered by semantic intent.
  • Latency Optimized: Achieves <500ms TTFB (Time to First Byte) using lightweight models and optimized FastAPI async handlers.
  • Event-Driven CRM: Call summaries are generated and pushed to Google Sheets via background tasks, ensuring the UI remains non-blocking.
  • Responsible AI Guardrails: Implements graceful degradation logic to intercept safety violations (toxicity, PII) and reroute users to safe conversational paths.

🛠️ Tech Stack & Architecture

This project demonstrates a modern AI Engineering stack, moving beyond simple API wrappers to a robust stateful system.

Component Technology Role
Frontend Next.js 14 + Tailwind Real-time UI state management & Audio Context handling.
Backend API FastAPI (Python) Async orchestration layer & WebSocket management.
LLM Engine Gemini 2.5 Flash Lite High-throughput, low-latency reasoning engine.
Vector DB ChromaDB Local, persistent storage for Retrieval Augmented Generation.
Voice Ops EdgeTTS / WebSpeech API Hybrid voice stack for zero-cost latency optimization.
Data Ops Google Sheets API Synchronous CRM logging & structured data extraction.

System Architecture

graph TD
    A[User Voice] -->|WebSpeech API| B(Next.js Frontend)
    B -->|Async POST| C{FastAPI Orchestrator}
    
    C -->|Intent Detection| D{Router}
    
    D -->|'Accessibility'| E[(ChromaDB RAG)]
    D -->|'Date Check'| F[Python Tools]
    D -->|General Chat| G[Gemini 2.5 Flash Lite]
    
    E --> G
    F --> G
    
    G -->|Response Text| H[EdgeTTS Engine]
    H -->|Audio Stream| B
    
    B -->|End Call Signal| I[CRM Worker]
    I -->|Summary Generation| G
    G -->|JSON| J[(Google Sheets)]

Loading

⚡ Performance & Engineering Decisions

  1. Ingestion vs. Inference: The Knowledge Base is built via an offline ETL pipeline (ingest.py), separating heavy embedding operations from the runtime (main.py). This ensures zero cold-start latency for the agent.
  2. Blocking vs. Non-Blocking:
  • Voice synthesis is streaming (non-blocking) for perceived speed.
  • CRM Logging is blocking (synchronous) at the end of the call to guarantee data integrity before the session closes.
  1. Prompt Engineering: System prompts utilize Chain-of-Thought (CoT) instructions to force the model to summarize RAG context into "speakable" 2-sentence answers, avoiding the "Robot Reading a PDF" problem.
  2. Adaptive Silence Detection (Debouncing) The client utilizes a Debounced Silence Strategy to distinguish natural thinking pauses from completion, ensuring the agent prioritizes conversational completeness and never cuts the user off mid-sentence.

📦 Installation & Setup

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • Google Cloud Service Account (for Sheets)
  • Gemini API Key

1. Backend Setup

cd backend
pip install -r requirements.txt

# Create .env file
echo "GEMINI_API_KEY=your_key" > .env
echo "GOOGLE_CREDS_FILE=credentials.json" >> .env

# Build the Vector Database (Run once)
python ingest.py

# Start the API
python main.py

2. Frontend Setup

cd hotel-agent-ui
npm install

# Configure API Endpoint
echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local

# Run Development Server
npm run dev

🛡️ License

This project is open-source and available under the MIT License.


Built by Devansh Mistry — AI Engineer.

About

A real-time (<500ms) voice AI concierge built with Next.js, FastAPI, and Gemini 2.5 Flash Lite. Features local RAG (ChromaDB) for policy retrieval, Tool Calling for live booking, and event-driven CRM logging to Google Sheets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors