Web Interface Title and Project Readme Updated

Abeshith · Abeshith · commit dd76d7fe6b5c · 2025-12-01T13:00:23.000+05:30
diff --git a/README.md b/README.md
@@ -0,0 +1,186 @@
+# RAG Project - Learn with Transformers
+
+A production-ready **Corrective Retrieval-Augmented Generation (CRAG)** system built with LangChain, LangGraph, and FastAPI. This project implements an intelligent RAG pipeline that not only retrieves relevant documents but also **validates, corrects, and improves** retrieval quality through an agent-based workflow.
+
+## What Makes This Different from Traditional RAG?
+
+### Traditional RAG:
+```
+Query → Retrieve Documents → Generate Answer
+```
+**Problem**: If retrieved documents are irrelevant or low-quality, the answer will be poor.
+
+### This Project (Corrective RAG):
+```
+Query → Retrieve → Grade Quality → Transform Query if Needed → Web Search if Necessary → Generate
+```
+**Solution**: Intelligent agent workflow that **self-corrects** by grading document relevance and taking corrective actions.
+
+## Architecture
+
+```mermaid
+graph LR
+    A[User Query] --> B[Retrieve]
+    B --> C[FAISS+MMR]
+    C --> D[Rerank]
+    D --> E{Grade}
+    E -->|Relevant| F[Generate]
+    E -->|Partial| G[Filter]
+    E -->|Poor| H[Transform]
+    G --> F
+    H --> I[Web Search]
+    I --> F
+    F --> J[Groq LLM]
+    J --> K[Answer]
+```
+
+## Key Features
+
+### 1. **Intelligent Document Grading**
+- LLM evaluates retrieved documents for relevance
+- Filters out low-quality results automatically
+- Ensures only useful context reaches generation
+
+### 2. **Query Transformation**
+- Rewrites ambiguous or poor queries
+- Improves retrieval on second attempt
+- Adaptive query refinement
+
+### 3. **Web Search Fallback**
+- Tavily API integration for external knowledge
+- Activates when local documents insufficient
+- Combines local + web results
+
+### 4. **Advanced Retrieval Stack**
+- **FAISS** vector store with MMR search
+- **FastEmbed** (BAAI/bge-small-en-v1.5) embeddings
+- **FlashRank** (rank-T5-flan) reranking
+- Self-query retriever support
+
+### 5. **LangGraph Agent Workflow**
+- State machine orchestration
+- Conditional routing logic
+- Transparent decision-making
+
+## Tech Stack
+
+| Component | Technology |
+|-----------|------------|
+| **LLM** | Groq (openai/gpt-oss-120b) |
+| **Embeddings** | FastEmbed (BAAI/bge-small-en-v1.5) |
+| **Vector Store** | FAISS |
+| **Reranker** | FlashRank (rank-T5-flan) |
+| **Agent Framework** | LangGraph |
+| **RAG Framework** | LangChain 0.3.x |
+| **Web Search** | Tavily API |
+| **Web Framework** | FastAPI + Uvicorn |
+| **Observability** | LangSmith (optional) |
+| **Document Source** | "Attention Is All You Need" (Transformer paper) |
+
+## Project Structure
+
+```
+RAG Project/
+├── project/
+│   ├── config/
+│   │   └── config.yaml              # Model & pipeline configuration
+│   ├── logger/
+│   │   └── logging.py               # Centralized logging
+│   ├── exception/
+│   │   └── except.py                # Custom exception handling
+│   ├── utils/
+│   │   ├── config_loader.py         # YAML config loader
+│   │   └── model_loader.py          # LLM & embedding initialization
+│   ├── source/
+│   │   └── data_preparation.py      # PDF/ArXiv document loading
+│   ├── model/
+│   │   ├── retriever.py             # FAISS retriever with MMR
+│   │   └── reranking.py             # FlashRank reranking
+│   ├── prompts/
+│   │   └── prompt_template.py       # RAG, Router, WebSearch prompts
+│   └── pipeline/
+│       ├── rag.py                   # Core RAG pipeline
+│       └── agents.py                # CRAG agent workflow
+├── templates/
+│   └── index.html                   # Web UI template
+├── static/
+│   └── styles.css                   # Purple gradient theme
+├── data/
+│   └── attention-is-all-you-need.pdf
+├── app.py                           # FastAPI application
+├── main.py                          # CLI entry point
+├── Dockerfile                       # Docker containerization
+└── requirements.txt                 # Dependencies
+
+```
+
+## Quick Start
+
+### 1. Clone & Install
+```bash
+git clone https://github.com/Abeshith/RAG-Project-PipeLine.git
+cd RAG-Project-PipeLine
+pip install -r requirements.txt
+```
+
+### 2. Set Environment Variables
+Create `.env` file:
+```env
+GROQ_API_KEY=your_groq_api_key
+GOOGLE_API_KEY=your_google_api_key
+LANGSMITH_API_KEY=your_langsmith_key  
+TAVILY_API_KEY=your_tavily_key        
+```
+
+### 3. Run Web Interface
+```bash
+python app.py
+```
+Visit: http://localhost:8000
+
+### 4. Run CLI
+```bash
+python main.py
+```
+
+## Docker Deployment
+
+### Build & Run
+```bash
+docker build -t rag-project .
+docker run -d -p 8000:8000 --env-file .env rag-project
+```
+
+## How It Works
+
+### Workflow Example
+
+**Query**: "What is the attention mechanism in transformers?"
+
+1. **Retrieval**: FAISS finds top 3 most similar chunks from "Attention Is All You Need" paper
+2. **Reranking**: FlashRank reorders by relevance (top 3 kept)
+3. **Grading**: LLM evaluates each document: 
+   - ✅ Doc 1: Relevant (explains attention)
+   - ✅ Doc 2: Relevant (shows formula)
+   - ❌ Doc 3: Not relevant (talks about training data)
+4. **Decision**: 2/3 relevant → Use filtered docs
+5. **Generation**: Groq LLM synthesizes answer from relevant docs
+6. **Output**: Comprehensive answer with LaTeX formulas (rendered via MathJax)
+
+### When Retrieval Fails
+
+**Query**: "What are the latest improvements to transformers in 2024?"
+
+1. **Retrieval**: Finds documents from 2017 paper
+2. **Grading**: ❌ All documents marked "not relevant" (outdated info)
+3. **Transform**: Rewrites query → "Recent transformer architecture improvements 2024"
+4. **Web Search**: Tavily searches current web content
+5. **Generation**: Answer combines paper fundamentals + recent developments
+
+## Web Interface Features
+
+- **Modern UI**: Purple gradient design with responsive layout
+- **MathJax Integration**: Renders LaTeX formulas beautifully
+- **Transformer Visualization**: Architecture diagram in header
+- **Real-time Search**: Fast async FastAPI backend
+- **Error Handling**: Graceful degradation with user-friendly messages
diff --git a/logs/2025_12_01.log b/logs/2025_12_01.log
@@ -360,3 +360,137 @@ ModuleNotFoundError("No module named 'faiss.swigfaiss_avx512'")
 [2025-12-01 11:10:09] INFO - project.model.reranking - Reranked 3 documents, returning top 3
 [2025-12-01 11:10:11] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
 [2025-12-01 11:10:11] INFO - project.pipeline.agents - Node 'generate' completed
+[2025-12-01 12:48:45] INFO - __main__ - LangSmith tracing enabled
+[2025-12-01 12:48:45] INFO - __main__ - Starting RAG application...
+[2025-12-01 12:48:45] INFO - project.utils.model_loader - GROQ API key loaded
+[2025-12-01 12:48:45] INFO - project.utils.model_loader - ModelLoader initialized
+[2025-12-01 12:48:46] INFO - project.utils.model_loader - Loaded Groq LLM: openai/gpt-oss-120b
+[2025-12-01 12:48:46] INFO - project.utils.model_loader - GROQ API key loaded
+[2025-12-01 12:48:46] INFO - project.utils.model_loader - ModelLoader initialized
+[2025-12-01 12:48:46] INFO - project.utils.model_loader - Loaded Groq LLM: openai/gpt-oss-120b
+[2025-12-01 12:48:46] INFO - project.source.data_preparation - DataPreparation initialized with chunk_size=1000
+[2025-12-01 12:48:46] INFO - project.utils.model_loader - GROQ API key loaded
+[2025-12-01 12:48:46] INFO - project.utils.model_loader - ModelLoader initialized
+[2025-12-01 12:48:49] INFO - project.utils.model_loader - Loaded FastEmbed: BAAI/bge-small-en-v1.5
+[2025-12-01 12:48:50] INFO - project.utils.model_loader - Loaded Groq LLM: openai/gpt-oss-120b
+[2025-12-01 12:48:50] INFO - project.model.retriever - DocumentRetriever initialized
+[2025-12-01 12:48:50] INFO - project.model.reranking - FlashRank reranker initialized with model: rank-T5-flan
+[2025-12-01 12:48:50] INFO - project.pipeline.rag - RAGPipeline initialized
+[2025-12-01 12:48:50] INFO - project.pipeline.agents - Web search tool initialized
+[2025-12-01 12:48:50] INFO - project.pipeline.agents - AgentWorkflow initialized
+[2025-12-01 12:48:50] INFO - __main__ - Setting up pipeline with Attention Is All You Need paper...
+[2025-12-01 12:48:50] INFO - project.source.data_preparation - Loading PDF from local file: data\attention-is-all-you-need.pdf
+[2025-12-01 12:48:51] INFO - project.source.data_preparation - Loaded 11 pages from PDF
+[2025-12-01 12:48:51] INFO - project.source.data_preparation - Split documents into 43 chunks
+[2025-12-01 12:48:51] INFO - project.source.data_preparation - Document preparation complete: 43 chunks ready
+[2025-12-01 12:48:58] INFO - faiss.loader - Loading faiss with AVX512 support.
+[2025-12-01 12:48:58] INFO - faiss.loader - Could not load library with AVX512 support due to:
+ModuleNotFoundError("No module named 'faiss.swigfaiss_avx512'")
+[2025-12-01 12:48:58] INFO - faiss.loader - Loading faiss with AVX2 support.
+[2025-12-01 12:48:58] INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
+[2025-12-01 12:48:58] INFO - project.model.retriever - Vector store created with 43 documents
+[2025-12-01 12:48:58] INFO - project.model.retriever - Base retriever configured with mmr search
+[2025-12-01 12:48:58] INFO - project.pipeline.rag - RAG chain built successfully
+[2025-12-01 12:48:58] INFO - project.pipeline.rag - RAG pipeline setup complete
+[2025-12-01 12:48:58] INFO - project.pipeline.agents - LangGraph workflow compiled
+[2025-12-01 12:48:58] INFO - project.pipeline.agents - Agent workflow setup complete
+[2025-12-01 12:48:59] INFO - project.pipeline.agents - Workflow graph saved to workflow.png
+[2025-12-01 12:48:59] INFO - __main__ - Workflow graph saved
+[2025-12-01 12:48:59] INFO - project.pipeline.agents - ---RETRIEVE---
+[2025-12-01 12:48:59] INFO - project.pipeline.agents - Node 'retrieve' completed
+[2025-12-01 12:48:59] INFO - project.pipeline.agents - ---CHECK DOCUMENT RELEVANCE TO QUESTION---
+[2025-12-01 12:49:00] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:00] INFO - project.pipeline.agents - ---GRADE: DOCUMENT RELEVANT---
+[2025-12-01 12:49:00] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:00] INFO - project.pipeline.agents - ---GRADE: DOCUMENT RELEVANT---
+[2025-12-01 12:49:00] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:00] INFO - project.pipeline.agents - ---GRADE: DOCUMENT RELEVANT---
+[2025-12-01 12:49:00] INFO - project.pipeline.agents - ---ASSESS GRADED DOCUMENTS---
+[2025-12-01 12:49:00] INFO - project.pipeline.agents - ---DECISION: RELEVANT DOCUMENTS FOUND, GENERATE---
+[2025-12-01 12:49:00] INFO - project.pipeline.agents - Node 'grade_documents' completed
+[2025-12-01 12:49:00] INFO - project.pipeline.agents - ---GENERATE---
+[2025-12-01 12:49:01] INFO - project.model.reranking - Reranked 3 documents, returning top 3
+[2025-12-01 12:49:03] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:03] INFO - project.pipeline.agents - Node 'generate' completed
+[2025-12-01 12:49:03] INFO - project.pipeline.agents - ---RETRIEVE---
+[2025-12-01 12:49:03] INFO - project.pipeline.agents - Node 'retrieve' completed
+[2025-12-01 12:49:03] INFO - project.pipeline.agents - ---CHECK DOCUMENT RELEVANCE TO QUESTION---
+[2025-12-01 12:49:04] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:04] INFO - project.pipeline.agents - ---GRADE: DOCUMENT RELEVANT---
+[2025-12-01 12:49:04] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:04] INFO - project.pipeline.agents - ---GRADE: DOCUMENT RELEVANT---
+[2025-12-01 12:49:04] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:04] INFO - project.pipeline.agents - ---GRADE: DOCUMENT RELEVANT---
+[2025-12-01 12:49:04] INFO - project.pipeline.agents - ---ASSESS GRADED DOCUMENTS---
+[2025-12-01 12:49:04] INFO - project.pipeline.agents - ---DECISION: RELEVANT DOCUMENTS FOUND, GENERATE---
+[2025-12-01 12:49:04] INFO - project.pipeline.agents - Node 'grade_documents' completed
+[2025-12-01 12:49:04] INFO - project.pipeline.agents - ---GENERATE---
+[2025-12-01 12:49:05] INFO - project.model.reranking - Reranked 3 documents, returning top 3
+[2025-12-01 12:49:07] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:07] INFO - project.pipeline.agents - Node 'generate' completed
+[2025-12-01 12:49:07] INFO - project.pipeline.agents - ---RETRIEVE---
+[2025-12-01 12:49:07] INFO - project.pipeline.agents - Node 'retrieve' completed
+[2025-12-01 12:49:07] INFO - project.pipeline.agents - ---CHECK DOCUMENT RELEVANCE TO QUESTION---
+[2025-12-01 12:49:08] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:08] INFO - project.pipeline.agents - ---GRADE: DOCUMENT NOT RELEVANT---
+[2025-12-01 12:49:08] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:08] INFO - project.pipeline.agents - ---GRADE: DOCUMENT RELEVANT---
+[2025-12-01 12:49:09] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:09] INFO - project.pipeline.agents - ---GRADE: DOCUMENT RELEVANT---
+[2025-12-01 12:49:09] INFO - project.pipeline.agents - ---ASSESS GRADED DOCUMENTS---
+[2025-12-01 12:49:09] INFO - project.pipeline.agents - ---DECISION: RELEVANT DOCUMENTS FOUND, GENERATE---
+[2025-12-01 12:49:09] INFO - project.pipeline.agents - Node 'grade_documents' completed
+[2025-12-01 12:49:09] INFO - project.pipeline.agents - ---GENERATE---
+[2025-12-01 12:49:09] INFO - project.model.reranking - Reranked 3 documents, returning top 3
+[2025-12-01 12:49:11] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:49:11] INFO - project.pipeline.agents - Node 'generate' completed
+[2025-12-01 12:49:11] INFO - __main__ - RAG application completed successfully
+[2025-12-01 12:49:21] INFO - __main__ - Initializing RAG pipeline...
+[2025-12-01 12:49:21] INFO - project.utils.model_loader - GROQ API key loaded
+[2025-12-01 12:49:21] INFO - project.utils.model_loader - ModelLoader initialized
+[2025-12-01 12:49:22] INFO - project.utils.model_loader - Loaded Groq LLM: openai/gpt-oss-120b
+[2025-12-01 12:49:22] INFO - project.utils.model_loader - GROQ API key loaded
+[2025-12-01 12:49:22] INFO - project.utils.model_loader - ModelLoader initialized
+[2025-12-01 12:49:22] INFO - project.utils.model_loader - Loaded Groq LLM: openai/gpt-oss-120b
+[2025-12-01 12:49:22] INFO - project.source.data_preparation - DataPreparation initialized with chunk_size=1000
+[2025-12-01 12:49:22] INFO - project.utils.model_loader - GROQ API key loaded
+[2025-12-01 12:49:22] INFO - project.utils.model_loader - ModelLoader initialized
+[2025-12-01 12:49:26] INFO - project.utils.model_loader - Loaded FastEmbed: BAAI/bge-small-en-v1.5
+[2025-12-01 12:49:27] INFO - project.utils.model_loader - Loaded Groq LLM: openai/gpt-oss-120b
+[2025-12-01 12:49:27] INFO - project.model.retriever - DocumentRetriever initialized
+[2025-12-01 12:49:28] INFO - project.model.reranking - FlashRank reranker initialized with model: rank-T5-flan
+[2025-12-01 12:49:28] INFO - project.pipeline.rag - RAGPipeline initialized
+[2025-12-01 12:49:28] INFO - project.pipeline.agents - Web search tool initialized
+[2025-12-01 12:49:28] INFO - project.pipeline.agents - AgentWorkflow initialized
+[2025-12-01 12:49:28] INFO - project.source.data_preparation - Loading PDF from local file: data\attention-is-all-you-need.pdf
+[2025-12-01 12:49:29] INFO - project.source.data_preparation - Loaded 11 pages from PDF
+[2025-12-01 12:49:29] INFO - project.source.data_preparation - Split documents into 43 chunks
+[2025-12-01 12:49:29] INFO - project.source.data_preparation - Document preparation complete: 43 chunks ready
+[2025-12-01 12:49:40] INFO - faiss.loader - Loading faiss with AVX512 support.
+[2025-12-01 12:49:40] INFO - faiss.loader - Could not load library with AVX512 support due to:
+ModuleNotFoundError("No module named 'faiss.swigfaiss_avx512'")
+[2025-12-01 12:49:40] INFO - faiss.loader - Loading faiss with AVX2 support.
+[2025-12-01 12:49:40] INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
+[2025-12-01 12:49:40] INFO - project.model.retriever - Vector store created with 43 documents
+[2025-12-01 12:49:40] INFO - project.model.retriever - Base retriever configured with mmr search
+[2025-12-01 12:49:40] INFO - project.pipeline.rag - RAG chain built successfully
+[2025-12-01 12:49:40] INFO - project.pipeline.rag - RAG pipeline setup complete
+[2025-12-01 12:49:40] INFO - project.pipeline.agents - LangGraph workflow compiled
+[2025-12-01 12:49:40] INFO - project.pipeline.agents - Agent workflow setup complete
+[2025-12-01 12:49:40] INFO - __main__ - RAG pipeline ready
+[2025-12-01 12:50:13] INFO - project.pipeline.agents - ---RETRIEVE---
+[2025-12-01 12:50:13] INFO - project.pipeline.agents - Node 'retrieve' completed
+[2025-12-01 12:50:13] INFO - project.pipeline.agents - ---CHECK DOCUMENT RELEVANCE TO QUESTION---
+[2025-12-01 12:50:14] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:50:14] INFO - project.pipeline.agents - ---GRADE: DOCUMENT RELEVANT---
+[2025-12-01 12:50:14] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:50:14] INFO - project.pipeline.agents - ---GRADE: DOCUMENT RELEVANT---
+[2025-12-01 12:50:15] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:50:15] INFO - project.pipeline.agents - ---GRADE: DOCUMENT RELEVANT---
+[2025-12-01 12:50:15] INFO - project.pipeline.agents - ---ASSESS GRADED DOCUMENTS---
+[2025-12-01 12:50:15] INFO - project.pipeline.agents - ---DECISION: RELEVANT DOCUMENTS FOUND, GENERATE---
+[2025-12-01 12:50:15] INFO - project.pipeline.agents - Node 'grade_documents' completed
+[2025-12-01 12:50:15] INFO - project.pipeline.agents - ---GENERATE---
+[2025-12-01 12:50:16] INFO - project.model.reranking - Reranked 3 documents, returning top 3
+[2025-12-01 12:50:18] INFO - httpx - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
+[2025-12-01 12:50:18] INFO - project.pipeline.agents - Node 'generate' completed
diff --git a/templates/index.html b/templates/index.html
@@ -3,7 +3,7 @@
 <head>
     <meta charset="UTF-8">
     <meta name="viewport" content="width=device-width, initial-scale=1.0">
-    <title>Learn with Transformers</title>
+    <title>RAG PROJECT PIPELINE</title>
     <link rel="stylesheet" href="/static/styles.css">
     <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
     <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>