Query your PDF documents using local LLMs with Ollama.
Chat with PDF is a Retrieval-Augmented Generation (RAG) application that lets you ask questions about PDF documents. All processing happens locally on your machine - no data is sent to external servers.
- Local PDF text extraction
- Semantic search using Ollama embeddings
- Question answering with local LLMs
- No API keys or external services required
- Save and load document indexes for faster startup
- Python 3.12+
- Ollama (https://ollama.ai)
- Required Ollama models:
llama3.3:70b(or another chat model)nomic-embed-text(for embeddings)
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# macOS (or download from https://ollama.ai)
brew install ollamaollama pull llama3.3:70b
ollama pull nomic-embed-text# Install pipenv if needed
pip install pipenv
# Install project dependencies
pipenv install
# Activate environment
pipenv shellpython src/main.py --pdf path/to/document.pdf# First time: process PDF and save index
python src/main.py --pdf document.pdf --save-index
# Next time: load the saved index
python src/main.py --load-index document.index.jsonpython src/main.py --pdf document.pdf --verboseEdit config/settings.yaml to customize:
llm:
model: "llama3.2" # Chat model
temperature: 0.3 # Creativity
embeddings:
model: "nomic-embed-text" # Embedding model
chunk_size: 500 # Words per chunk
chunk_overlap: 50 # Overlap between chunks
search:
top_k: 3 # Number of relevant chunkschat_with_pdf/
├── data/papers/ # Your PDF files
├── src/
│ ├── __init__.py
│ ├── pdf_loader.py # PDF text extraction
│ ├── embeddings.py # Ollama embeddings
│ ├── vector_store.py # Simple vector storage
│ ├── chat.py # Chat interface
│ └── main.py # CLI entry point
├── config/settings.yaml # Configuration
├── logs/ # Log files
├── Pipfile # Dependencies
└── README.md
- Load PDF: Extract text from PDF using PyMuPDF
- Chunk: Split text into overlapping chunks
- Embed: Create embeddings for each chunk using Ollama
- Store: Keep embeddings in memory (or save to file)
- Query: Embed user question, find similar chunks
- Answer: Generate response using context + LLM
| RAM | Recommended Setup |
|---|---|
| 8 GB | llama3.2:1b + nomic-embed-text |
| 16 GB | llama3.2:3b + nomic-embed-text |
| 32+ GB | llama3.1:8b + nomic-embed-text |
GPU acceleration (NVIDIA, Apple Silicon) significantly speeds up processing.
| Aspect | Chat with PDF (Local) | Cloud RAG Services |
|---|---|---|
| Privacy | Data stays local | Data sent to servers |
| Cost | Free | Per-query pricing |
| Speed | Depends on hardware | Generally fast |
| Internet | Not required | Required |
| Models | Limited by RAM | Latest models available |
MIT License