A retrieval-augmented generation (RAG) system for querying ML/AI research papers using BM25 sparse retrieval — no vector embeddings or external APIs required. Users ask natural language questions and receive grounded answers with citations to the source papers.
Most RAG systems rely on vector embeddings and similarity search. This project demonstrates that strong retrieval can be achieved with classical BM25 (Best Match 25), a keyword-based ranking algorithm widely used in information retrieval. Retrieved abstracts are passed as context to a locally-running LLM which synthesizes a cited answer.
User Question
↓
BM25 Retriever (rank_bm25)
↓
Top-K Relevant Abstracts
↓
LLM Context Window (Ollama — qwen2.5)
↓
Grounded Answer + Citations
arxiv-rag/
├── data/
│ └── fetch_papers.py # Fetches ArXiv abstracts via API
├── rag/
│ ├── retriever.py # BM25 index and retrieval logic
│ └── generator.py # LLM answer synthesis with citations
├── ui/
│ └── app.py # Streamlit chat interface
├── requirements.txt
└── README.md
- Vectorless retrieval — BM25 ranking with no embeddings or vector database
- Grounded answers — LLM is instructed to cite only the retrieved papers
- Source transparency — every retrieved paper is shown with title, authors, BM25 score, and abstract preview
- Adjustable top-k — slider to control how many papers are retrieved per query
- Fully local — runs on Ollama, no external API required
~500 ArXiv abstracts fetched via the ArXiv API across 8 ML/AI topic areas:
- Machine Learning
- Deep Learning
- Natural Language Processing
- Reinforcement Learning
- Computer Vision
- Large Language Models
- Graph Neural Networks
- Transformer Architecture
| Layer | Technology |
|---|---|
| Data | ArXiv API |
| Retrieval | BM25 (rank_bm25) |
| LLM | qwen2.5 via Ollama (local) |
| UI | Streamlit |
| Language | Python |
- Python 3.8+
- Ollama installed and running with
qwen2.5pulled
ollama pull qwen2.5pip install -r requirements.txtpython data/fetch_papers.pyThis fetches ~500 unique ArXiv abstracts and saves them to data/papers.json.
streamlit run ui/app.pyOpen http://localhost:8501 in your browser.
- "What methods are used for image segmentation?"
- "How do transformers work in NLP?"
- "What are the latest advances in reinforcement learning?"
- "How are graph neural networks used in practice?"
- "What are common techniques for training large language models?"
- "What are the challenges of deploying machine learning models in production?"
data/papers.jsonis not committed — regenerate it withpython data/fetch_papers.py- BM25 index is built in-memory at startup (~1 second for 500 papers)
- Ollama must be running before starting the app (
ollama serveif not auto-started)