|
| 1 | +## RAG Chatbot |
| 2 | + |
| 3 | +A full-stack Retrieval-Augmented Generation (RAG) application that enables intelligent, document-based question answering. |
| 4 | +The system integrates a FastAPI backend powered by LangChain, FAISS, and AI models, alongside a modern React + Vite + Tailwind CSS frontend for an intuitive chat experience. |
| 5 | + |
| 6 | +## Table of Contents |
| 7 | + |
| 8 | +- [Project Overview](#project-overview) |
| 9 | +- [Features](#features) |
| 10 | +- [Architecture](#architecture) |
| 11 | +- [Prerequisites](#prerequisites) |
| 12 | +- [Quick Start Deployment](#quick-start-deployment) |
| 13 | +- [User Interface](#user-interface) |
| 14 | +- [Troubleshooting](#troubleshooting) |
| 15 | +- [Additional Info](#additional-info) |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## Project Overview |
| 20 | + |
| 21 | +The **RAG Chatbot** demonstrates how retrieval-augmented generation can be used to build intelligent, document-grounded conversational systems. It retrieves relevant information from a knowledge base, passes it to a large language model, and generates a concise and reliable answer to the user’s query. This project integrates seamlessly with cloud-hosted APIs or local model endpoints, offering flexibility for research, enterprise, or educational use. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## Features |
| 26 | + |
| 27 | +**Backend** |
| 28 | + |
| 29 | +- Clean PDF upload with validation |
| 30 | +- LangChain-powered document processing |
| 31 | +- FAISS-CPU vector store for efficient similarity search |
| 32 | +- Enterprise inference endpoints for embeddings and LLM |
| 33 | +- Token-based authentication for inference API |
| 34 | +- Comprehensive error handling and logging |
| 35 | +- File validation and size limits |
| 36 | +- CORS enabled for web integration |
| 37 | +- Health check endpoints |
| 38 | +- Modular architecture (routes + services) |
| 39 | + |
| 40 | +**Frontend** |
| 41 | + |
| 42 | +- PDF file upload with drag-and-drop support |
| 43 | +- Real-time chat interface |
| 44 | +- Modern, responsive design with Tailwind CSS |
| 45 | +- Built with Vite for fast development |
| 46 | +- Live status updates |
| 47 | +- Mobile-friendly |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## Architecture |
| 52 | + |
| 53 | +Below is the architecture as it consists of a server that waits for documents to embed and index into a vector database. Once documents have been uploaded, the server will wait for user queries which initiates a similarity search in the vector database before calling the LLM service to summarize the findings. |
| 54 | + |
| 55 | + |
| 56 | + |
| 57 | +**Service Components:** |
| 58 | + |
| 59 | +1. **React Web UI (Port 3000)** - Provides intuitive chat interface with drag-and-drop PDF upload, real-time messaging, and document-grounded Q&A interaction |
| 60 | + |
| 61 | +2. **FastAPI Backend (Port 5001)** - Handles document processing, FAISS vector storage, LangChain integration, and orchestrates retrieval-augmented generation for accurate responses |
| 62 | + |
| 63 | +**Typical Flow:** |
| 64 | + |
| 65 | +1. User uploads a document through the web UI. |
| 66 | +2. The backend processes the document by splitting it and transforming it into embeddings before storing it in the vector database. |
| 67 | +3. User sends a question through the web UI. |
| 68 | +4. The backend retrieves relevant content from stored documents. |
| 69 | +5. The model generates a response based on retrieved context. |
| 70 | +6. The answer is displayed to the user via the UI. |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## Prerequisites |
| 75 | + |
| 76 | +### System Requirements |
| 77 | + |
| 78 | +Before you begin, ensure you have the following installed: |
| 79 | + |
| 80 | +- **Docker and Docker Compose** |
| 81 | +- **Enterprise inference endpoint access** (token-based authentication) |
| 82 | + |
| 83 | +### Required API Configuration |
| 84 | + |
| 85 | +**For Inference Service (RAG Chatbot):** |
| 86 | + |
| 87 | +This application supports multiple inference deployment patterns: |
| 88 | + |
| 89 | +- **GenAI Gateway**: Provide your GenAI Gateway URL and API key |
| 90 | + - To generate the GenAI Gateway API key, use the [generate-vault-secrets.sh](https://github.com/opea-project/Enterprise-Inference/blob/main/core/scripts/generate-vault-secrets.sh) script |
| 91 | + - The API key is the `litellm_master_key` value from the generated `vault.yml` file |
| 92 | + |
| 93 | +- **APISIX Gateway**: Provide your APISIX Gateway URL and authentication token |
| 94 | + - To generate the APISIX authentication token, use the [generate-token.sh](https://github.com/opea-project/Enterprise-Inference/blob/main/core/scripts/generate-token.sh) script |
| 95 | + - The token is generated using Keycloak client credentials |
| 96 | + |
| 97 | +### Local Development Configuration |
| 98 | + |
| 99 | +**For Local Testing Only (Optional)** |
| 100 | + |
| 101 | +If you're testing with a local inference endpoint using a custom domain (e.g., `api.example.com` mapped to localhost in your hosts file): |
| 102 | + |
| 103 | +1. Edit `api/.env` and set: |
| 104 | + ```bash |
| 105 | + LOCAL_URL_ENDPOINT=api.example.com |
| 106 | + ``` |
| 107 | + (Use the domain name from your INFERENCE_API_ENDPOINT without `https://`) |
| 108 | + |
| 109 | +2. This allows Docker containers to resolve your local domain correctly. |
| 110 | + |
| 111 | +**Note:** For public domains or cloud-hosted endpoints, leave the default value `not-needed`. |
| 112 | + |
| 113 | +### Verify Docker Installation |
| 114 | + |
| 115 | +```bash |
| 116 | +# Check Docker version |
| 117 | +docker --version |
| 118 | + |
| 119 | +# Check Docker Compose version |
| 120 | +docker compose version |
| 121 | + |
| 122 | +# Verify Docker is running |
| 123 | +docker ps |
| 124 | +``` |
| 125 | +--- |
| 126 | + |
| 127 | +## Quick Start Deployment |
| 128 | + |
| 129 | +### Clone the Repository |
| 130 | + |
| 131 | +```bash |
| 132 | +git clone https://github.com/opea-project/Enterprise-Inference.git |
| 133 | +cd Enterprise-Inference/sample_solutions/RAGChatbot |
| 134 | +``` |
| 135 | + |
| 136 | +### Set up the Environment |
| 137 | + |
| 138 | +This application requires **two `.env` files** for proper configuration: |
| 139 | + |
| 140 | +1. **Root `.env` file** (for Docker Compose variables) |
| 141 | +2. **`api/.env` file** (for backend application configuration) |
| 142 | + |
| 143 | +#### Step 1: Create Root `.env` File |
| 144 | + |
| 145 | +```bash |
| 146 | +# From the RAGChatbot directory |
| 147 | +cat > .env << EOF |
| 148 | +# Docker Compose Configuration |
| 149 | +LOCAL_URL_ENDPOINT=not-needed |
| 150 | +EOF |
| 151 | +``` |
| 152 | + |
| 153 | +**Note:** If using a local domain (e.g., `api.example.com` mapped to localhost), replace `not-needed` with your domain name (without `https://`). |
| 154 | + |
| 155 | +#### Step 2: Create `api/.env` File |
| 156 | + |
| 157 | +Copy from the example file and edit with your actual credentials: |
| 158 | + |
| 159 | +```bash |
| 160 | +cp api/.env.example api/.env |
| 161 | +``` |
| 162 | + |
| 163 | +Then edit `api/.env` to set your `INFERENCE_API_ENDPOINT` and `INFERENCE_API_TOKEN`. |
| 164 | + |
| 165 | +Or manually create `api/.env` with: |
| 166 | + |
| 167 | +```bash |
| 168 | +# Inference API Configuration |
| 169 | +# INFERENCE_API_ENDPOINT: URL to your inference service (without /v1 suffix) |
| 170 | +# |
| 171 | +# **GenAI Gateway**: Provide your GenAI Gateway URL and API key |
| 172 | +# - URL format: https://genai-gateway.example.com |
| 173 | +# - To generate the GenAI Gateway API key, use the [generate-vault-secrets.sh] script |
| 174 | +# - The API key is the litellm_master_key value from the generated vault.yml file |
| 175 | +# |
| 176 | +# **APISIX Gateway**: Provide your APISIX Gateway URL and authentication token |
| 177 | +# - For APISIX, include the model name in the INFERENCE_API_ENDPOINT path |
| 178 | +# - Example: https://apisix-gateway.example.com/Llama-3.1-8B-Instruct |
| 179 | +# - Set EMBEDDING_API_ENDPOINT separately for the embedding model |
| 180 | +# - Example: https://apisix-gateway.example.com/bge-base-en-v1.5 |
| 181 | +# - To generate the APISIX authentication token, use the [generate-token.sh] script |
| 182 | +# - The token is generated using Keycloak client credentials |
| 183 | +# |
| 184 | +# INFERENCE_API_TOKEN: Authentication token/API key for the inference service |
| 185 | +INFERENCE_API_ENDPOINT=https://api.example.com |
| 186 | +INFERENCE_API_TOKEN=your-pre-generated-token-here |
| 187 | + |
| 188 | +# Model Configuration |
| 189 | +EMBEDDING_MODEL_NAME=BAAI/bge-base-en-v1.5 |
| 190 | +INFERENCE_MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct |
| 191 | + |
| 192 | +# APISIX Gateway Endpoints |
| 193 | +# Uncomment and set these when using APISIX Gateway: |
| 194 | +# IMPORTANT: Use exact APISIX route paths: |
| 195 | +# Example routes: /bge-base-en-v1.5/* and /Llama-3.1-8B-Instruct/* |
| 196 | +# INFERENCE_API_ENDPOINT=https://api.example.com/Llama-3.1-8B-Instruct |
| 197 | +# EMBEDDING_API_ENDPOINT=https://api.example.com/bge-base-en-v1.5 |
| 198 | + |
| 199 | +# Local URL Endpoint (only needed for non-public domains) |
| 200 | +# If using a local domain like api.example.com mapped to localhost: |
| 201 | +# Set this to: api.example.com (domain without https://) |
| 202 | +# If using a public domain, set any placeholder value like: not-needed |
| 203 | +LOCAL_URL_ENDPOINT=not-needed |
| 204 | + |
| 205 | +# SSL Verification Settings |
| 206 | +# Set to false only for dev with self-signed certs |
| 207 | +VERIFY_SSL=true |
| 208 | +``` |
| 209 | + |
| 210 | +**Important Configuration Notes:** |
| 211 | + |
| 212 | +- **INFERENCE_API_ENDPOINT**: Your actual inference service URL (replace `https://your-actual-api-endpoint.com`) |
| 213 | + - For APISIX/Keycloak deployments, the model name must be included in the endpoint URL (e.g., `https://apisix-gateway.example.com/Llama-3.1-8B-Instruct`) |
| 214 | +- **INFERENCE_API_TOKEN**: Your actual pre-generated authentication token |
| 215 | +- **EMBEDDING_MODEL_NAME** and **INFERENCE_MODEL_NAME**: Use the exact model names from your inference service |
| 216 | + - To check available models: `curl https://your-api-endpoint.com/v1/models -H "Authorization: Bearer your-token"` |
| 217 | + - **Important for APISIX/Keycloak**: You need a separate endpoint for the embedding model. Configure `EMBEDDING_ENDPOINT` with the embedding model in the URL path (e.g., `https://apisix-gateway.example.com/bge-base-en-v1.5`) |
| 218 | +- **LOCAL_URL_ENDPOINT**: Only needed if using local domain mapping (see [Local Development Configuration](#local-development-configuration)) |
| 219 | + |
| 220 | +**Note**: The docker-compose.yml file automatically loads environment variables from both `.env` (root) and `./api/.env` (backend) files. |
| 221 | + |
| 222 | +### Running the Application |
| 223 | + |
| 224 | +Start both API and UI services together with Docker Compose: |
| 225 | + |
| 226 | +```bash |
| 227 | +# From the RAGChatbot directory |
| 228 | +docker compose up --build |
| 229 | + |
| 230 | +# Or run in detached mode (background) |
| 231 | +docker compose up -d --build |
| 232 | +``` |
| 233 | + |
| 234 | +The API will be available at: `http://localhost:5001` |
| 235 | +The UI will be available at: `http://localhost:3000` |
| 236 | + |
| 237 | +**View logs**: |
| 238 | + |
| 239 | +```bash |
| 240 | +# All services |
| 241 | +docker compose logs -f |
| 242 | + |
| 243 | +# Backend only |
| 244 | +docker compose logs -f backend |
| 245 | + |
| 246 | +# Frontend only |
| 247 | +docker compose logs -f frontend |
| 248 | +``` |
| 249 | + |
| 250 | +**Verify the services are running**: |
| 251 | + |
| 252 | +```bash |
| 253 | +# Check API health |
| 254 | +curl http://localhost:5001/health |
| 255 | + |
| 256 | +# Check if containers are running |
| 257 | +docker compose ps |
| 258 | +``` |
| 259 | + |
| 260 | +## User Interface |
| 261 | + |
| 262 | +**Using the Application** |
| 263 | + |
| 264 | +Make sure you are at the `http://localhost:3000` URL |
| 265 | + |
| 266 | +You will be directed to the main page which has each feature |
| 267 | + |
| 268 | + |
| 269 | + |
| 270 | +Upload a PDF: |
| 271 | + |
| 272 | +- Drag and drop a PDF file, or |
| 273 | +- Click "Browse Files" to select a file |
| 274 | +- Wait for processing to complete |
| 275 | + |
| 276 | +Start chatting: |
| 277 | + |
| 278 | +- Type your question in the input field |
| 279 | +- Press Enter or click Send |
| 280 | +- Get AI-powered answers based on your document |
| 281 | + |
| 282 | +**UI Configuration** |
| 283 | + |
| 284 | +When running with Docker Compose, the UI automatically connects to the backend API. The frontend is available at `http://localhost:3000` and the API at `http://localhost:5001`. |
| 285 | + |
| 286 | +For production deployments, you may want to configure a reverse proxy or update the API URL in the frontend configuration. |
| 287 | + |
| 288 | +### Stopping the Application |
| 289 | + |
| 290 | +```bash |
| 291 | +docker compose down |
| 292 | +``` |
| 293 | + |
| 294 | +## Troubleshooting |
| 295 | + |
| 296 | +For comprehensive troubleshooting guidance, common issues, and solutions, refer to: |
| 297 | + |
| 298 | +[Troubleshooting Guide - TROUBLESHOOTING.md](./TROUBLESHOOTING.md) |
| 299 | + |
| 300 | +--- |
| 301 | + |
| 302 | +## Additional Info |
| 303 | + |
| 304 | +The following models have been validated with RAGChatbot: |
| 305 | + |
| 306 | +| Model | Hardware | |
| 307 | +|-------|----------| |
| 308 | +| **meta-llama/Llama-3.1-8B-Instruct** | Gaudi | |
| 309 | +| **BAAI/bge-base-en-v1.5** (embeddings) | Gaudi | |
| 310 | +| **Qwen/Qwen3-4B-Instruct** | Xeon | |
0 commit comments