Skip to content

Commit f27ea8a

Browse files
authored
Merge pull request opea-project#53 from cld2labs/cld2labs/RAGChatbot
cld2labs/RAGChatbot
2 parents 9798817 + b9ce202 commit f27ea8a

34 files changed

Lines changed: 2797 additions & 0 deletions
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Environment files
2+
**/.env
3+
4+
# Test files
5+
**/test.txt
6+
7+
# Python
8+
__pycache__/
9+
*.py[cod]
10+
*$py.class
11+
*.so
12+
.Python
13+
*.egg-info/
14+
dist/
15+
build/
16+
17+
# Virtual environments
18+
venv/
19+
env/
20+
ENV/
21+
22+
# IDE
23+
.vscode/
24+
.idea/
25+
*.swp
26+
*.swo
27+
*~
28+
29+
# OS
30+
.DS_Store
31+
Thumbs.db
32+
33+
# Application specific
34+
dmv_index/
35+
*.log
36+
37+
# Node.js
38+
node_modules/
39+
npm-debug.log*
40+
yarn-debug.log*
41+
yarn-error.log*
42+
package-lock.json
Lines changed: 310 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,310 @@
1+
## RAG Chatbot
2+
3+
A full-stack Retrieval-Augmented Generation (RAG) application that enables intelligent, document-based question answering.
4+
The system integrates a FastAPI backend powered by LangChain, FAISS, and AI models, alongside a modern React + Vite + Tailwind CSS frontend for an intuitive chat experience.
5+
6+
## Table of Contents
7+
8+
- [Project Overview](#project-overview)
9+
- [Features](#features)
10+
- [Architecture](#architecture)
11+
- [Prerequisites](#prerequisites)
12+
- [Quick Start Deployment](#quick-start-deployment)
13+
- [User Interface](#user-interface)
14+
- [Troubleshooting](#troubleshooting)
15+
- [Additional Info](#additional-info)
16+
17+
---
18+
19+
## Project Overview
20+
21+
The **RAG Chatbot** demonstrates how retrieval-augmented generation can be used to build intelligent, document-grounded conversational systems. It retrieves relevant information from a knowledge base, passes it to a large language model, and generates a concise and reliable answer to the user’s query. This project integrates seamlessly with cloud-hosted APIs or local model endpoints, offering flexibility for research, enterprise, or educational use.
22+
23+
---
24+
25+
## Features
26+
27+
**Backend**
28+
29+
- Clean PDF upload with validation
30+
- LangChain-powered document processing
31+
- FAISS-CPU vector store for efficient similarity search
32+
- Enterprise inference endpoints for embeddings and LLM
33+
- Token-based authentication for inference API
34+
- Comprehensive error handling and logging
35+
- File validation and size limits
36+
- CORS enabled for web integration
37+
- Health check endpoints
38+
- Modular architecture (routes + services)
39+
40+
**Frontend**
41+
42+
- PDF file upload with drag-and-drop support
43+
- Real-time chat interface
44+
- Modern, responsive design with Tailwind CSS
45+
- Built with Vite for fast development
46+
- Live status updates
47+
- Mobile-friendly
48+
49+
---
50+
51+
## Architecture
52+
53+
Below is the architecture as it consists of a server that waits for documents to embed and index into a vector database. Once documents have been uploaded, the server will wait for user queries which initiates a similarity search in the vector database before calling the LLM service to summarize the findings.
54+
55+
![Architecture Diagram](./images/RAG%20Model%20System%20Design.png)
56+
57+
**Service Components:**
58+
59+
1. **React Web UI (Port 3000)** - Provides intuitive chat interface with drag-and-drop PDF upload, real-time messaging, and document-grounded Q&A interaction
60+
61+
2. **FastAPI Backend (Port 5001)** - Handles document processing, FAISS vector storage, LangChain integration, and orchestrates retrieval-augmented generation for accurate responses
62+
63+
**Typical Flow:**
64+
65+
1. User uploads a document through the web UI.
66+
2. The backend processes the document by splitting it and transforming it into embeddings before storing it in the vector database.
67+
3. User sends a question through the web UI.
68+
4. The backend retrieves relevant content from stored documents.
69+
5. The model generates a response based on retrieved context.
70+
6. The answer is displayed to the user via the UI.
71+
72+
---
73+
74+
## Prerequisites
75+
76+
### System Requirements
77+
78+
Before you begin, ensure you have the following installed:
79+
80+
- **Docker and Docker Compose**
81+
- **Enterprise inference endpoint access** (token-based authentication)
82+
83+
### Required API Configuration
84+
85+
**For Inference Service (RAG Chatbot):**
86+
87+
This application supports multiple inference deployment patterns:
88+
89+
- **GenAI Gateway**: Provide your GenAI Gateway URL and API key
90+
- To generate the GenAI Gateway API key, use the [generate-vault-secrets.sh](https://github.com/opea-project/Enterprise-Inference/blob/main/core/scripts/generate-vault-secrets.sh) script
91+
- The API key is the `litellm_master_key` value from the generated `vault.yml` file
92+
93+
- **APISIX Gateway**: Provide your APISIX Gateway URL and authentication token
94+
- To generate the APISIX authentication token, use the [generate-token.sh](https://github.com/opea-project/Enterprise-Inference/blob/main/core/scripts/generate-token.sh) script
95+
- The token is generated using Keycloak client credentials
96+
97+
### Local Development Configuration
98+
99+
**For Local Testing Only (Optional)**
100+
101+
If you're testing with a local inference endpoint using a custom domain (e.g., `api.example.com` mapped to localhost in your hosts file):
102+
103+
1. Edit `api/.env` and set:
104+
```bash
105+
LOCAL_URL_ENDPOINT=api.example.com
106+
```
107+
(Use the domain name from your INFERENCE_API_ENDPOINT without `https://`)
108+
109+
2. This allows Docker containers to resolve your local domain correctly.
110+
111+
**Note:** For public domains or cloud-hosted endpoints, leave the default value `not-needed`.
112+
113+
### Verify Docker Installation
114+
115+
```bash
116+
# Check Docker version
117+
docker --version
118+
119+
# Check Docker Compose version
120+
docker compose version
121+
122+
# Verify Docker is running
123+
docker ps
124+
```
125+
---
126+
127+
## Quick Start Deployment
128+
129+
### Clone the Repository
130+
131+
```bash
132+
git clone https://github.com/opea-project/Enterprise-Inference.git
133+
cd Enterprise-Inference/sample_solutions/RAGChatbot
134+
```
135+
136+
### Set up the Environment
137+
138+
This application requires **two `.env` files** for proper configuration:
139+
140+
1. **Root `.env` file** (for Docker Compose variables)
141+
2. **`api/.env` file** (for backend application configuration)
142+
143+
#### Step 1: Create Root `.env` File
144+
145+
```bash
146+
# From the RAGChatbot directory
147+
cat > .env << EOF
148+
# Docker Compose Configuration
149+
LOCAL_URL_ENDPOINT=not-needed
150+
EOF
151+
```
152+
153+
**Note:** If using a local domain (e.g., `api.example.com` mapped to localhost), replace `not-needed` with your domain name (without `https://`).
154+
155+
#### Step 2: Create `api/.env` File
156+
157+
Copy from the example file and edit with your actual credentials:
158+
159+
```bash
160+
cp api/.env.example api/.env
161+
```
162+
163+
Then edit `api/.env` to set your `INFERENCE_API_ENDPOINT` and `INFERENCE_API_TOKEN`.
164+
165+
Or manually create `api/.env` with:
166+
167+
```bash
168+
# Inference API Configuration
169+
# INFERENCE_API_ENDPOINT: URL to your inference service (without /v1 suffix)
170+
#
171+
# **GenAI Gateway**: Provide your GenAI Gateway URL and API key
172+
# - URL format: https://genai-gateway.example.com
173+
# - To generate the GenAI Gateway API key, use the [generate-vault-secrets.sh] script
174+
# - The API key is the litellm_master_key value from the generated vault.yml file
175+
#
176+
# **APISIX Gateway**: Provide your APISIX Gateway URL and authentication token
177+
# - For APISIX, include the model name in the INFERENCE_API_ENDPOINT path
178+
# - Example: https://apisix-gateway.example.com/Llama-3.1-8B-Instruct
179+
# - Set EMBEDDING_API_ENDPOINT separately for the embedding model
180+
# - Example: https://apisix-gateway.example.com/bge-base-en-v1.5
181+
# - To generate the APISIX authentication token, use the [generate-token.sh] script
182+
# - The token is generated using Keycloak client credentials
183+
#
184+
# INFERENCE_API_TOKEN: Authentication token/API key for the inference service
185+
INFERENCE_API_ENDPOINT=https://api.example.com
186+
INFERENCE_API_TOKEN=your-pre-generated-token-here
187+
188+
# Model Configuration
189+
EMBEDDING_MODEL_NAME=BAAI/bge-base-en-v1.5
190+
INFERENCE_MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
191+
192+
# APISIX Gateway Endpoints
193+
# Uncomment and set these when using APISIX Gateway:
194+
# IMPORTANT: Use exact APISIX route paths:
195+
# Example routes: /bge-base-en-v1.5/* and /Llama-3.1-8B-Instruct/*
196+
# INFERENCE_API_ENDPOINT=https://api.example.com/Llama-3.1-8B-Instruct
197+
# EMBEDDING_API_ENDPOINT=https://api.example.com/bge-base-en-v1.5
198+
199+
# Local URL Endpoint (only needed for non-public domains)
200+
# If using a local domain like api.example.com mapped to localhost:
201+
# Set this to: api.example.com (domain without https://)
202+
# If using a public domain, set any placeholder value like: not-needed
203+
LOCAL_URL_ENDPOINT=not-needed
204+
205+
# SSL Verification Settings
206+
# Set to false only for dev with self-signed certs
207+
VERIFY_SSL=true
208+
```
209+
210+
**Important Configuration Notes:**
211+
212+
- **INFERENCE_API_ENDPOINT**: Your actual inference service URL (replace `https://your-actual-api-endpoint.com`)
213+
- For APISIX/Keycloak deployments, the model name must be included in the endpoint URL (e.g., `https://apisix-gateway.example.com/Llama-3.1-8B-Instruct`)
214+
- **INFERENCE_API_TOKEN**: Your actual pre-generated authentication token
215+
- **EMBEDDING_MODEL_NAME** and **INFERENCE_MODEL_NAME**: Use the exact model names from your inference service
216+
- To check available models: `curl https://your-api-endpoint.com/v1/models -H "Authorization: Bearer your-token"`
217+
- **Important for APISIX/Keycloak**: You need a separate endpoint for the embedding model. Configure `EMBEDDING_ENDPOINT` with the embedding model in the URL path (e.g., `https://apisix-gateway.example.com/bge-base-en-v1.5`)
218+
- **LOCAL_URL_ENDPOINT**: Only needed if using local domain mapping (see [Local Development Configuration](#local-development-configuration))
219+
220+
**Note**: The docker-compose.yml file automatically loads environment variables from both `.env` (root) and `./api/.env` (backend) files.
221+
222+
### Running the Application
223+
224+
Start both API and UI services together with Docker Compose:
225+
226+
```bash
227+
# From the RAGChatbot directory
228+
docker compose up --build
229+
230+
# Or run in detached mode (background)
231+
docker compose up -d --build
232+
```
233+
234+
The API will be available at: `http://localhost:5001`
235+
The UI will be available at: `http://localhost:3000`
236+
237+
**View logs**:
238+
239+
```bash
240+
# All services
241+
docker compose logs -f
242+
243+
# Backend only
244+
docker compose logs -f backend
245+
246+
# Frontend only
247+
docker compose logs -f frontend
248+
```
249+
250+
**Verify the services are running**:
251+
252+
```bash
253+
# Check API health
254+
curl http://localhost:5001/health
255+
256+
# Check if containers are running
257+
docker compose ps
258+
```
259+
260+
## User Interface
261+
262+
**Using the Application**
263+
264+
Make sure you are at the `http://localhost:3000` URL
265+
266+
You will be directed to the main page which has each feature
267+
268+
![User Interface](images/ui.png)
269+
270+
Upload a PDF:
271+
272+
- Drag and drop a PDF file, or
273+
- Click "Browse Files" to select a file
274+
- Wait for processing to complete
275+
276+
Start chatting:
277+
278+
- Type your question in the input field
279+
- Press Enter or click Send
280+
- Get AI-powered answers based on your document
281+
282+
**UI Configuration**
283+
284+
When running with Docker Compose, the UI automatically connects to the backend API. The frontend is available at `http://localhost:3000` and the API at `http://localhost:5001`.
285+
286+
For production deployments, you may want to configure a reverse proxy or update the API URL in the frontend configuration.
287+
288+
### Stopping the Application
289+
290+
```bash
291+
docker compose down
292+
```
293+
294+
## Troubleshooting
295+
296+
For comprehensive troubleshooting guidance, common issues, and solutions, refer to:
297+
298+
[Troubleshooting Guide - TROUBLESHOOTING.md](./TROUBLESHOOTING.md)
299+
300+
---
301+
302+
## Additional Info
303+
304+
The following models have been validated with RAGChatbot:
305+
306+
| Model | Hardware |
307+
|-------|----------|
308+
| **meta-llama/Llama-3.1-8B-Instruct** | Gaudi |
309+
| **BAAI/bge-base-en-v1.5** (embeddings) | Gaudi |
310+
| **Qwen/Qwen3-4B-Instruct** | Xeon |

0 commit comments

Comments
 (0)