v1.2 — Modular Architecture | Dense + BM25 + RRF + Reranking + Gemini Generation
A production-inspired Hybrid RAG (Retrieval-Augmented Generation) system combining dense vector search, sparse BM25 retrieval, and Reciprocal Rank Fusion — now refactored into a clean, modular architecture.
User Query
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ ingestion/ │ → │ chunking/ │ → │ embeddings/ │
│ loader.py │ │ splitter.py │ │ encoder.py │
└─────────────┘ └─────────────┘ └──────┬──────┘
│
┌──────▼──────┐
│ vectordb/ │
│chroma_store │
└──────┬──────┘
│
User Query ─────────────────────────► retrieval/
hybrid_search.py
(Dense + BM25 + RRF)
│
┌──────▼──────┐
│ llm/ │
│ reranker.py │ ← Skip-rerank optimisation
└──────┬──────┘
│
┌──────▼──────┐
│ prompts/ │
│templates.py │
└──────┬──────┘
│
┌──────▼──────┐
│ llm/ │
│gemini_client│
└──────┬──────┘
│
┌──────▼──────┐
│ Answer │
│ + Sources │
└─────────────┘
rag-project/
├── config.yaml # Central configuration (models, thresholds, paths)
├── main.py # CLI entry point
├── requirements.txt
├── .env # API keys (not committed)
├── .gitignore
│
├── ingestion/ # File loading (txt, pdf)
│ └── loader.py
│
├── chunking/ # Sentence-aware document splitting
│ └── splitter.py
│
├── embeddings/ # SentenceTransformer encoder
│ └── encoder.py
│
├── vectordb/ # ChromaDB store & query
│ └── chroma_store.py
│
├── retrieval/ # Hybrid search: Dense + BM25 + RRF
│ └── hybrid_search.py
│
├── prompts/ # Prompt templates (versioned, reusable)
│ └── templates.py
│
├── llm/ # LLM clients (Gemini) + reranker
│ ├── gemini_client.py
│ └── reranker.py
│
├── api/ # FastAPI HTTP endpoints
│ └── app.py
│
├── utils/ # Pipeline orchestration helpers
│ └── helpers.py
│
├── tests/ # Evaluation harness + unit tests
│ ├── eval.py
│ └── test_skip_rerank.py
│
├── logs/ # Runtime logs (gitignored except .gitkeep)
├── data/belgeler/ # Source documents (.txt, .pdf)
└── db/ # ChromaDB + chunks.json (gitignored)
pip install -r requirements.txt# .env
GEMINI_API_KEY=your_key_herePlace .txt or .pdf files in data/belgeler/.
python utils/helpers.pypython main.pyuvicorn api.app:app --reload
# → POST http://localhost:8000/query {"question": "..."}
# → POST http://localhost:8000/ingest
# → GET http://localhost:8000/healthAll settings are in config.yaml:
embedding:
model: "all-MiniLM-L6-v2"
chunking:
chunk_size: 500
chunk_overlap: 50
reranker:
mode: "local" # "local" | "gemini"
skip_threshold: 0.03 # skip reranker when top-1 dominates
llm:
model: "gemini-flash-latest"
temperature: 0.2| Feature | Module | Description |
|---|---|---|
| Sentence-aware chunking | chunking/splitter.py |
Never cuts mid-sentence; table-line detection |
| Hybrid retrieval | retrieval/hybrid_search.py |
Dense (ChromaDB) + Sparse (BM25) + RRF |
| Skip-rerank | llm/reranker.py |
Bypass reranker for high-confidence queries |
| Local reranker | llm/reranker.py |
cross-encoder/ms-marco-MiniLM-L-6-v2 |
| Prompt templates | prompts/templates.py |
Versioned, separated from model logic |
| FastAPI | api/app.py |
REST endpoints for integration |
| Eval harness | tests/eval.py |
Citation check + keyword relevance + latency |
# Standard eval
python tests/eval.py
# Compare rerankers (Gemini vs Local latency)
python tests/eval.py --compare-rerankers
# Skip-rerank unit tests
python tests/test_skip_rerank.py| Contributor | Improvement |
|---|---|
| Ahmet Özel | Sentence-aware chunking replacing fixed 500-char splits |
| Gunjan Tailor | Skip-rerank optimisation — bypass reranker for high-confidence queries |
| Tae Kim | Local cross-encoder fallback — eliminates Gemini API dependency for reranking |
MIT