Skip to content

vivekananda-2201/Local_MultiAgentic_RAG_System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Local MultiAgentic RAG System

A production-grade Retrieval Augmented Generation (RAG) system with a multi-agent architecture, powered by Ollama for local LLM inference. This system features intelligent query parsing, semantic search, and multi-turn conversation support with a modern React + TypeScript frontend.

🎯 Project Overview

The Local MultiAgentic RAG System is designed to provide an enterprise-grade solution for building intelligent conversational applications that can access and reason over custom knowledge bases. It combines multiple specialized agents to break down complex queries, retrieve relevant information, and generate contextually accurate responses.

Key Features

  • Multi-Agent Architecture: Specialized agents for query parsing, refinement, and response generation
  • Semantic Search: Vector-based retrieval using Chroma and Ollama embeddings
  • Local Inference: Run entirely on your machine using Ollama - no external APIs required
  • Modular Backend: FastAPI-based architecture with clean separation of concerns
  • Modern Frontend: React + TypeScript with responsive, cyberpunk-themed UI
  • SQLite Persistence: Store conversations and metadata locally
  • Session Management: Multi-session support with full conversation history
  • Production-Ready: Error handling, logging, and scalable architecture

πŸ—οΈ Architecture

System Components

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Frontend (React + TS)                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Sidebar  β”‚   Chat UI    β”‚   Sources Panel          β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚ HTTP/WebSocket (REST API)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              FastAPI Backend (Python)                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚  β”‚              API Routes Layer                         β”‚β”‚
β”‚  β”‚  β”œβ”€ Chat Endpoints      β”œβ”€ Knowledge Base           β”‚β”‚
β”‚  β”‚  β”œβ”€ Session Management  └─ File Upload             β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚  β”‚           Multi-Agent RAG Pipeline                    β”‚β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚β”‚
β”‚  β”‚  β”‚ Query    β”‚β†’ β”‚ RAG      β”‚β†’ β”‚Response  β”‚           β”‚β”‚
β”‚  β”‚  β”‚ Parser   β”‚  β”‚ Query    β”‚  β”‚Generator β”‚           β”‚β”‚
β”‚  β”‚  β”‚ Agent    β”‚  β”‚ Agent    β”‚  β”‚ Agent    β”‚           β”‚β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚  β”‚           Data & Storage Layer                        β”‚β”‚
β”‚  β”‚  β”œβ”€ Vector DB (Chroma)      β”œβ”€ Chat DB (SQLite)    β”‚β”‚
β”‚  β”‚  β”œβ”€ PDF Processing          └─ Embeddings (Ollama) β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Multi-Agentic Verification Pipeline

The system implements a sophisticated multi-agent verification pipeline:

1. Query Parser Agent

  • Purpose: Analyzes user queries in context of conversation history
  • Functionality:
    • Resolves pronouns and contextual references
    • Splits compound questions into focused sub-queries
    • Generates explicit, self-contained search queries
    • Handles follow-up questions by enriching with prior context

Example:

User: "Tell me more about them"
Context: [Prior discussion about Indian laws]
β†’ Resolved: "Indian laws detailed explanation"

2. RAG Query Agent

  • Purpose: Optimizes queries for vector database search
  • Functionality:
    • Refines parsed queries for semantic similarity
    • Adds domain-specific keywords and context
    • Ensures optimal retrieval from knowledge base
    • Adapts to conversation context

Example:

Query: "What are fundamental rights?"
Context: [Discussion about Indian Constitution]
β†’ Refined: "Fundamental rights Indian Constitution Article 12-35 explanation"

3. Response Generation Agent

  • Purpose: Generates accurate, context-aware responses
  • Functionality:
    • Synthesizes information from retrieved chunks
    • Maintains consistency with conversation history
    • Cites sources appropriately
    • Falls back gracefully when information is unavailable

Verification mechanisms:

  • Context relevance scoring
  • Source attribution
  • Factual consistency checking
  • Conversation coherence validation

πŸ“ Project Structure

Local_MultiAgentic_RAG_System/
β”œβ”€β”€ backend/                          # Backend FastAPI application
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   └── settings.py              # Central configuration
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   └── rag_chat.py              # RAG pipeline orchestration
β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”œβ”€β”€ models.py                # Model configurations
β”‚   β”‚   β”œβ”€β”€ query_parser.py          # Query resolution agent
β”‚   β”‚   β”œβ”€β”€ rag_query_agent.py       # Query refinement agent
β”‚   β”‚   └── response_agent.py        # Response generation agent
β”‚   β”œβ”€β”€ modules/
β”‚   β”‚   β”œβ”€β”€ embedding_function.py    # Ollama embeddings
β”‚   β”‚   β”œβ”€β”€ pdf_loader.py            # PDF document loading
β”‚   β”‚   β”œβ”€β”€ text_splitter.py         # Document chunking
β”‚   β”‚   └── vector_db.py             # Chroma operations
β”‚   β”œβ”€β”€ database/
β”‚   β”‚   └── chat_db.py               # SQLite operations
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   └── routes/
β”‚   β”‚       β”œβ”€β”€ chat.py              # Chat endpoints
β”‚   β”‚       β”œβ”€β”€ knowledge.py         # Knowledge base endpoints
β”‚   β”‚       └── __init__.py
β”‚   └── main.py                      # Uvicorn, startup config (imported as main.py in root)
β”‚
β”œβ”€β”€ frontend/                         # React + TypeScript application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ Chat/               # Chat interface
β”‚   β”‚   β”‚   β”œβ”€β”€ Sidebar/            # Session management
β”‚   β”‚   β”‚   β”œβ”€β”€ SourcesPanel/       # Retrieved sources display
β”‚   β”‚   β”‚   └── Common/             # Shared components
β”‚   β”‚   β”œβ”€β”€ pages/                  # Page components
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”‚   └── apiService.ts       # API client
β”‚   β”‚   β”œβ”€β”€ hooks/
β”‚   β”‚   β”‚   └── index.ts            # Custom React hooks
β”‚   β”‚   β”œβ”€β”€ types/
β”‚   β”‚   β”‚   └── index.ts            # TypeScript types
β”‚   β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”‚   └── index.ts            # Utility functions
β”‚   β”‚   β”œβ”€β”€ styles/
β”‚   β”‚   β”‚   β”œβ”€β”€ globals.css         # Global styles
β”‚   β”‚   β”‚   └── theme.ts            # Theme configuration
β”‚   β”‚   β”œβ”€β”€ App.tsx                 # Main app component
β”‚   β”‚   └── main.tsx                # React entry point
β”‚   β”œβ”€β”€ public/                     # Static assets
β”‚   β”œβ”€β”€ package.json
β”‚   β”œβ”€β”€ tsconfig.json
β”‚   β”œβ”€β”€ vite.config.ts
β”‚   └── index.html
β”‚
β”œβ”€β”€ knowledge_base/                  # PDF documents folder
β”œβ”€β”€ chroma_db/                       # Vector database storage
β”œβ”€β”€ main.py                          # FastAPI entry point
β”œβ”€β”€ requirements_backend.txt         # Python dependencies
β”œβ”€β”€ requirements.txt                 # Legacy
β”œβ”€β”€ README.md                        # This file
└── LICENSE

πŸ”§ Installation & Setup

Prerequisites

  • Python 3.9+
  • Node.js 18+
  • Ollama (Download from ollama.ai)

Backend Setup

  1. Install Python dependencies:
pip install -r requirements_backend.txt
  1. Pull required Ollama models:
# LLM model for agents
ollama pull qwen2.5:3b

# Embedding model for semantic search
ollama pull bge-m3:latest
  1. Verify Ollama is running (should be accessible at http://localhost:11434)

Frontend Setup

  1. Navigate to frontend directory:
cd frontend
  1. Install Node dependencies:
npm install
  1. Create environment file:
cp .env.example .env
# Edit .env if needed (default points to localhost:8000)

πŸš€ Running the Application

Start Backend

From project root:

python main.py

The backend will start at http://localhost:8000

Available endpoints:

  • API: http://localhost:8000/api/
  • Docs: http://localhost:8000/docs

Start Frontend

In a new terminal:

cd frontend
npm run dev

Frontend will be available at http://localhost:3000

Using the Application

  1. Add Knowledge Base:

    • Place PDF files in ./knowledge_base/ directory
    • Or use the upload feature in the UI
    • Frontend will automatically index new PDFs
  2. Chat Interface:

    • Type questions in the input box
    • Chat history is automatically saved in SQLite
    • Sources are displayed alongside responses
  3. Session Management:

    • Create new chat sessions using the "+" button
    • View all past sessions in the sidebar
    • Delete sessions to clean up

πŸ“Š Database Schema

SQLite Tables

sessions

Stores chat session metadata:

CREATE TABLE sessions (
    id TEXT PRIMARY KEY,
    title TEXT DEFAULT 'New Chat',
    created_at TIMESTAMP,
    updated_at TIMESTAMP,
    metadata TEXT -- JSON metadata
);

messages

Stores all messages in sessions:

CREATE TABLE messages (
    id INTEGER PRIMARY KEY,
    session_id TEXT,
    role TEXT,              -- 'user' or 'assistant'
    content TEXT,
    sources TEXT,           -- JSON array of sources
    tokens_used INTEGER,
    timestamp TIMESTAMP,
    FOREIGN KEY (session_id) REFERENCES sessions(id)
);

chunk_references

Tracks which chunks were used for each message:

CREATE TABLE chunk_references (
    id INTEGER PRIMARY KEY,
    message_id INTEGER,
    chunk_id TEXT,
    source_file TEXT,
    page_number INTEGER,
    relevance_score REAL,
    FOREIGN KEY (message_id) REFERENCES messages(id)
);

conversation_metadata

Stores additional session metadata:

CREATE TABLE conversation_metadata (
    id INTEGER PRIMARY KEY,
    session_id TEXT,
    key TEXT,
    value TEXT,
    FOREIGN KEY (session_id) REFERENCES sessions(id)
);

Vector Database (Chroma)

  • Collection: pdf_chunks
  • Embeddings: bge-m3 (384-dimensional vectors)
  • Similarity Metric: Cosine similarity
  • Chunk Size: 600 tokens
  • Chunk Overlap: 200 tokens
  • Threshold: 0.6 similarity score

πŸ€– Agent Specifications

Query Parser Agent

  • Model: qwen2.5:3b
  • Input: User query + recent conversation context
  • Output: List of explicit search queries
  • Key Features:
    • Pronoun resolution
    • Context enrichment
    • Query decomposition

RAG Query Agent

  • Model: qwen2.5:3b
  • Input: Parsed query + conversation context
  • Output: Optimized search query
  • Key Features:
    • Keyword extraction
    • Domain adaptation
    • Semantic optimization

Response Agent

  • Model: qwen2.5:3b
  • Input: Question + retrieved context + history
  • Output: Natural language response
  • Key Features:
    • Context synthesis
    • Source attribution
    • Hallucination prevention

πŸ“‘ API Endpoints

Chat API

POST /api/chat/message
- Send a message and get a response
- Request: { message: string, session_id?: string }
- Response: { response: string, session_id: string, sources: [] }

WebSocket /api/chat/ws/{session_id}
- Streaming responses via WebSocket
- Message format: { message: string }

GET /api/chat/sessions
- Get all sessions

GET /api/chat/session/{session_id}
- Get specific session with messages

POST /api/chat/session/create
- Create new session

DELETE /api/chat/session/{session_id}
- Delete session

Knowledge Base API

GET /api/knowledge/stats
- Get knowledge base statistics

GET /api/knowledge/structure
- Get KB structure (files β†’ pages β†’ chunks)

GET /api/knowledge/chunks
- Get all indexed chunks

POST /api/knowledge/refresh
- Refresh/reindex knowledge base

GET /api/knowledge/search?query=...&k=5
- Search knowledge base

🎨 Frontend Features

Components

  • Chat: Main conversation interface with auto-scroll
  • Sidebar: Session management with quick access
  • SourcesPanel: Display retrieved sources with scores
  • Common: Reusable UI components

Hooks

  • useChat(): Chat state management
  • useSessions(): Session management
  • useKnowledgeBase(): Knowledge base operations
  • useWebSocket(): WebSocket connection handling

Styling

  • Cyberpunk-inspired dark theme
  • Responsive design (mobile, tablet, desktop)
  • Smooth animations and transitions
  • Accessibility-focused

πŸ”’ Security Considerations

  • CORS configured for localhost only (configure for production)
  • No external API calls - completely local
  • PDFs processed locally without transmission
  • SQLite database is local and encrypted via filesystem permissions
  • Implement authentication layer for production deployment

πŸ“ˆ Performance Optimization

  • Chunking Strategy: 600 tokens with 200-token overlap optimizes balance between context and retrieval
  • Similarity Threshold: 0.6 ensures relevant results while maintaining precision
  • Context Window: Last 3 conversational turns (6 messages) for agent context
  • Embedding Model: bge-m3 provides high-quality semantic representations

πŸ› οΈ Development & Maintenance

Adding Custom Agents

  1. Create new agent file in backend/agents/
  2. Implement agent logic using Ollama chat API
  3. Register in backend/core/rag_chat.py
  4. Update API routes if needed

Customizing the Frontend

  • Modify component files in frontend/src/components/
  • Update styles in component .css files
  • Extend types in frontend/src/types/index.ts
  • Add hooks in frontend/src/hooks/index.ts

Extending Knowledge Base

  • Add PDFs to knowledge_base/ directory
  • Use upload endpoint to add files programmatically
  • System automatically detects and indexes new PDFs
  • Use /api/knowledge/refresh to reprocess all files

πŸ“š Dependencies

Backend

  • FastAPI: Modern async web framework
  • Uvicorn: ASGI server
  • LangChain: LLM and vector store abstractions
  • Chroma: Vector database
  • Ollama: Local LLM inference
  • SQLite3: Persistent message storage

Frontend

  • React 18: UI framework
  • TypeScript: Type-safe JavaScript
  • Axios: HTTP client
  • Vite: Build tool and dev server
  • Marked: Markdown rendering
  • Lucide React: Icon library

πŸ› Troubleshooting

Ollama Models Not Found

# Verify Ollama is running
ollama list

# Re-pull required models
ollama pull qwen2.5:3b
ollama pull bge-m3:latest

Backend Connection Issues

  • Ensure port 8000 is available
  • Check CORS settings if frontend can't reach backend
  • Verify Ollama is running on port 11434

Frontend Build Issues

cd frontend
rm -rf node_modules package-lock.json
npm install
npm run build

πŸ“ License

This project is licensed under the MIT License - see LICENSE file for details.

πŸ™‹ Support & Contributing

For issues, questions, or contributions:

  1. Check existing documentation
  2. Review the project structure and code comments
  3. Test locally before submitting changes
  4. Follow the existing code style and patterns

πŸš€ Future Enhancements

  • User authentication and authorization
  • Fine-tuning support for domain-specific models
  • Advanced query expansion and synonymy handling
  • Document versioning and management
  • Real-time collaboration features
  • Advanced analytics and insights
  • Multi-language support
  • GPU optimization for faster inference
  • API rate limiting and usage tracking
  • Backup and disaster recovery

Last Updated: June 2026
Version: 1.0.0
Status: Production-Ready

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors