🔴 Watch Demo Video (YouTube) > See the agent handle booking logic, context switching, and CRM logging in real-time.
✨ Experience the High-Fidelity Version > Demo utilizing ElevenLabs Agents for ultra-realistic voice synthesis.
The "Living Glass" UI reacts to voice states in real-time: Listening (Green), Processing (Purple), and Saving (Amber).
Hotel Voice RAG Agent is an enterprise-grade voice orchestrator(conversational AI) designed to reduce operational overhead by delivering 24/7 Tier-1 support, seamlessly blending generative conversation with deterministic business logic for secure room reservations.
Unlike standard chatbots, this system acts as an Intelligent Orchestrator. It dynamically routes user intent between deterministic Tools (Availability Calendar), probabilistic Generative AI (Gemini 2.5 Flash), and a Local Knowledge Base (RAG via ChromaDB).
The architecture prioritizes speed and reliability, achieving sub-second voice-to-voice latency by leveraging gemini-2.5-flash-lite for inference and EdgeTTS for synthesis. It features an Event-Driven CRM Pipeline that asynchronously logs call outcomes to Google Sheets without blocking the voice thread.
- Hybrid Orchestration: Seamlessly switches context between general chat, specific policy retrieval (RAG), and logic-based booking tools.
- Local RAG Implementation: Uses ChromaDB (persistent on-disk) to retrieve "long-tail" knowledge (e.g., accessibility policies) only when triggered by semantic intent.
- Latency Optimized: Achieves
<500msTTFB (Time to First Byte) using lightweight models and optimized FastAPI async handlers. - Event-Driven CRM: Call summaries are generated and pushed to Google Sheets via background tasks, ensuring the UI remains non-blocking.
- Responsible AI Guardrails: Implements graceful degradation logic to intercept safety violations (toxicity, PII) and reroute users to safe conversational paths.
This project demonstrates a modern AI Engineering stack, moving beyond simple API wrappers to a robust stateful system.
| Component | Technology | Role |
|---|---|---|
| Frontend | Next.js 14 + Tailwind | Real-time UI state management & Audio Context handling. |
| Backend API | FastAPI (Python) | Async orchestration layer & WebSocket management. |
| LLM Engine | Gemini 2.5 Flash Lite | High-throughput, low-latency reasoning engine. |
| Vector DB | ChromaDB | Local, persistent storage for Retrieval Augmented Generation. |
| Voice Ops | EdgeTTS / WebSpeech API | Hybrid voice stack for zero-cost latency optimization. |
| Data Ops | Google Sheets API | Synchronous CRM logging & structured data extraction. |
graph TD
A[User Voice] -->|WebSpeech API| B(Next.js Frontend)
B -->|Async POST| C{FastAPI Orchestrator}
C -->|Intent Detection| D{Router}
D -->|'Accessibility'| E[(ChromaDB RAG)]
D -->|'Date Check'| F[Python Tools]
D -->|General Chat| G[Gemini 2.5 Flash Lite]
E --> G
F --> G
G -->|Response Text| H[EdgeTTS Engine]
H -->|Audio Stream| B
B -->|End Call Signal| I[CRM Worker]
I -->|Summary Generation| G
G -->|JSON| J[(Google Sheets)]
- Ingestion vs. Inference: The Knowledge Base is built via an offline ETL pipeline (
ingest.py), separating heavy embedding operations from the runtime (main.py). This ensures zero cold-start latency for the agent. - Blocking vs. Non-Blocking:
- Voice synthesis is streaming (non-blocking) for perceived speed.
- CRM Logging is blocking (synchronous) at the end of the call to guarantee data integrity before the session closes.
- Prompt Engineering: System prompts utilize Chain-of-Thought (CoT) instructions to force the model to summarize RAG context into "speakable" 2-sentence answers, avoiding the "Robot Reading a PDF" problem.
- Adaptive Silence Detection (Debouncing) The client utilizes a Debounced Silence Strategy to distinguish natural thinking pauses from completion, ensuring the agent prioritizes conversational completeness and never cuts the user off mid-sentence.
- Python 3.10+
- Node.js 18+
- Google Cloud Service Account (for Sheets)
- Gemini API Key
cd backend
pip install -r requirements.txt
# Create .env file
echo "GEMINI_API_KEY=your_key" > .env
echo "GOOGLE_CREDS_FILE=credentials.json" >> .env
# Build the Vector Database (Run once)
python ingest.py
# Start the API
python main.py
cd hotel-agent-ui
npm install
# Configure API Endpoint
echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local
# Run Development Server
npm run dev
This project is open-source and available under the MIT License.
Built by Devansh Mistry — AI Engineer.