RAG Intelligence System is a production-ready document Q&A platform built on LangChain, OpenAI, and ChromaDB. Upload documents, ask precise questions grounded strictly in your content, and let a LLM pipeline (GPT-4o) handle the rest — with FIFO long-term memory, a summarisation agent with email delivery, and a clean single-page interface served by FastAPI.
rag_project/
├── backend/
│ ├── main.py # FastAPI entrypoint
│ ├── config.py # Settings & env vars
│ ├── chains/
│ │ ├── __init__.py
│ │ ├── rag_chain.py # RAG pipeline with runnables
│ ├── memory/
│ │ ├── __init__.py
│ │ └── fifo_memory.py # FIFO long-term memory (max 10 Q&A)
│ ├── agents/
│ │ ├── __init__.py
│ │ ├── summary_agent.py # Summarization agent
│ │ └── tools.py # Email + summarize tools
│ ├── routers/
│ │ ├── __init__.py
│ │ ├── documents.py # Upload & ingest endpoints
│ │ ├── qa.py # Q&A endpoint
│ │ └── agent.py # Agent endpoint
│ └── vectorstore/
│ ├── __init__.py
│ └── store.py # Chroma vector store manager
├── frontend/
│ └── index.html # Single-page UI
├── .env.example # Environment variable template
└── requirements.txt # Python dependencies
cp .env.example .envand fill in API keyspip install -r requirements.txtuvicorn backend.main:app --reload- Open
frontend/index.htmlin browser (or serve via FastAPI static)
- Summarise all uploaded document content (summarize_documents_tool).
- Summarise the user's Q&A conversation history (summarize_qa_history_tool).
- Send the combined summary via email (send_email_tool).
- The agent uses OpenAI function-calling under the hood (with the LLM as the backbone) so it decides which tools to call and in what order to fulfil a natural-language request such as: "Summarise my documents and email the result to alice@example.com"
- create_openai_tools_agent – wires prompt + LLM + tools into an agent.
- AgentExecutor – run-loop that calls tools until done.
- ChatPromptTemplate – structured system + human + agent scratch-pad.
- MessagesPlaceholder – slot for dynamic agent_scratchpad messages.
Defines the Retrieval-Augmented Generation (RAG) pipeline using LangChain LCEL (LangChain Expression Language) runnables.
User question
│
▼
[RunnableParallel] ── retrieve relevant chunks from Chroma
│ ── pass-through the original question
▼
[PromptTemplate] ── fill {context} + {question} slots
▼
[LLM] ── generate concise, grounded answer
▼
[StrOutputParser] ── extract plain text from AIMessage
- RunnableParallel – fan-out to retriever + identity in one step.
- RunnableLambda – wrap plain callables as runnables.
- RunnablePassthrough – forward input unchanged.
- StrOutputParser – parse the model output to a plain string.
- ChatPromptTemplate – structured prompt with system + human turns.
- | pipe operator – compose a RunnableSequence.
Provides FIFOMemory: a simple, persistent Q&A store that keeps at most max_items entries, evicting the oldest (FIFO) when the limit is exceeded.
- Built on top of LangChain's BaseChatMessageHistory interface so it can be plugged directly into RunnableWithMessageHistory.
- Backed by a plain JSON file per user-session so data survives restarts without requiring a database.
- Thread-safe via a threading.Lock.
Wrap any chain with :func:wrap_chain_with_memory to get automatic
history injection and persistence.
FastAPI router that exposes the LangChain summarisation agent.
Endpoints
---------
POST /agent/run
Accept a natural-language instruction and session_id, run the
summarisation agent (which may call summarise_documents_tool,
summarise_qa_history_tool, and/or send_email_tool), and return the
agent's final output.
FastAPI router for document management endpoints.
POST /documents/upload
Upload one or more files (.pdf, .txt, .docx) and ingest them into the
Chroma vector store.
GET /documents/list
List all document sources currently indexed.
DELETE /documents/{source}
Remove all chunks for a given source filename.
FastAPI router for the RAG Q&A endpoint.
POST /qa/ask
Accept a question and session_id, run the RAG chain, persist the Q&A
to FIFO memory, and return the answer.
GET /qa/history/{session_id}
Retrieve the stored Q&A history for a session.
DELETE /qa/history/{session_id}
Clear the memory for a session.
Manages the Chroma vector store used to index user-uploaded documents.
- Create / open a persistent Chroma collection.
- Ingest Document objects (split into chunks first).
- Expose a VectorStoreRetriever for the RAG chain.
- Delete all vectors for a given source file (for re-upload scenarios).
- Chroma – vector store backed by ChromaDB.
- OpenAIEmbeddings – embed text chunks via OpenAI's API.
- RecursiveCharacterTextSplitter – split raw documents into chunks.
Centralised application settings loaded from environment variables via pydantic-settings. A single settings singleton is imported everywhere so every module reads from the same source of truth.
FastAPI application entrypoint.
Registers all routers, configures CORS (for the standalone HTML frontend),
and serves the frontend's index.html at the root path.
Ruz: uvicorn backend.main:app --reload --port 8000