A fully local, privacy-first RAG application that lets you chat with your own documents and the latest AI news — powered by Ollama, LangChain, ChromaDB and Streamlit. No API keys. No data leaves your machine.
LocalMind is a Retrieval Augmented Generation (RAG) application that combines:
- A web scraper that collects the latest AI news articles in English, Spanish and Dutch
- A vector database (ChromaDB) that stores and searches through document content
- A local LLM (Llama 3.2 via Ollama) that answers questions based on your documents
- A Streamlit chat interface so you can interact with everything in your browser
The key feature: everything runs locally. No OpenAI API key, no data sent to the cloud, no cost per query.
Web Scraper → .txt files → Document Loader → Chunking → Embeddings → ChromaDB
↓
User Question → Retriever
↓
Relevant Chunks → Llama 3.2
↓
Answer
- 🔍 Web scraper — automatically collects AI news articles from multiple sources
- 📄 Document ingestion — supports PDF and TXT files
- 🗄️ Local vector store — ChromaDB stores embeddings on disk, no external database needed
- 🦙 Local LLM — runs Llama 3.2 via Ollama, fully offline
- 💬 Chat interface — Streamlit UI with conversation history
- 🔄 Refresh knowledge base — just run the scraper and ingest again to update
- ➕ Easily extensible — add new sources with a single line of code
localmind/
│
├── app.py # Streamlit chat UI
├── ingest.py # Document loader, chunker and embedder
├── rag_chain.py # LangChain RAG pipeline
├── scraper.py # Web scraper for AI news articles
├── data/ # Scraped articles and documents
├── vectorstore/ # ChromaDB embeddings (auto-generated)
├── requirements.txt
└── README.md
| Tool | Purpose |
|---|---|
| 🦙 Ollama + Llama 3.2 | Run LLM locally |
| 🦜 LangChain | RAG pipeline and document handling |
| 🗄️ ChromaDB | Local vector store for embeddings |
| 🖥️ Streamlit | Chat UI in the browser |
| 🐍 Python | Core language |
| 📰 Newspaper3k | Article text extraction |
| 🍲 BeautifulSoup | HTML parsing for scraper |
| 📄 PyMuPDF | PDF loading |
- Python 3.10+
- Ollama installed and running
Pull the required models:
ollama pull llama3.2
ollama pull nomic-embed-textgit clone https://github.com/gail-mar/localmind.git
cd localmindpython -m venv venv
# Mac/Linux
source venv/bin/activate
# Windows
venv\Scripts\activatepip install -r requirements.txtpython scraper.pypython ingest.pystreamlit run app.pyOpen your browser at http://localhost:8501 and start chatting! 🎉
The scraper currently collects articles from:
| Language | Source |
|---|---|
| 🇬🇧 English | TechCrunch AI, MIT Technology Review |
| 🇪🇸 Spanish | Xataka Inteligencia Artificial |
| 🇳🇱 Dutch | Techzine AI |
Open scraper.py and add a URL to the relevant language list:
sources = {
"en": [
"https://techcrunch.com/category/artificial-intelligence/",
"https://www.technologyreview.com/topic/artificial-intelligence/",
"https://www.wired.com/tag/artificial-intelligence/", # ← add here
],
...
}What are the latest developments in AI?
What is happening with OpenAI?
What are the latest updates to Apple Intelligence?
How is AI changing the job market?
To update LocalMind with the latest articles:
python scraper.py # fetches new articles, skips existing ones
python ingest.py # adds new content to ChromaDBThen refresh the Streamlit app and you're up to date!
- Add a refresh button inside the Streamlit UI
- Show sources used to answer each question
- Support CSV and Word document ingestion
- Add more news sources per language
- Conversation memory across sessions
Gail Marechal — Data Science & AI
Built from scratch as a portfolio project to demonstrate RAG, LangChain, local LLMs and web scraping.
