🧠 LocalMind — Chat with Your Documents

A fully local, privacy-first RAG application that lets you chat with your own documents and the latest AI news — powered by Ollama, LangChain, ChromaDB and Streamlit. No API keys. No data leaves your machine.

📌 Overview

LocalMind is a Retrieval Augmented Generation (RAG) application that combines:

A web scraper that collects the latest AI news articles in English, Spanish and Dutch
A vector database (ChromaDB) that stores and searches through document content
A local LLM (Llama 3.2 via Ollama) that answers questions based on your documents
A Streamlit chat interface so you can interact with everything in your browser

The key feature: everything runs locally. No OpenAI API key, no data sent to the cloud, no cost per query.

🔁 Pipeline

Web Scraper → .txt files → Document Loader → Chunking → Embeddings → ChromaDB
                                                                          ↓
                                                          User Question → Retriever
                                                                          ↓
                                                               Relevant Chunks → Llama 3.2
                                                                          ↓
                                                                       Answer

✨ Features

🔍 Web scraper — automatically collects AI news articles from multiple sources
📄 Document ingestion — supports PDF and TXT files
🗄️ Local vector store — ChromaDB stores embeddings on disk, no external database needed
🦙 Local LLM — runs Llama 3.2 via Ollama, fully offline
💬 Chat interface — Streamlit UI with conversation history
🔄 Refresh knowledge base — just run the scraper and ingest again to update
➕ Easily extensible — add new sources with a single line of code

🗂️ Project Structure

localmind/
│
├── app.py              # Streamlit chat UI
├── ingest.py           # Document loader, chunker and embedder
├── rag_chain.py        # LangChain RAG pipeline
├── scraper.py          # Web scraper for AI news articles
├── data/               # Scraped articles and documents
├── vectorstore/        # ChromaDB embeddings (auto-generated)
├── requirements.txt
└── README.md

🛠️ Tech Stack

Tool	Purpose
🦙 Ollama + Llama 3.2	Run LLM locally
🦜 LangChain	RAG pipeline and document handling
🗄️ ChromaDB	Local vector store for embeddings
🖥️ Streamlit	Chat UI in the browser
🐍 Python	Core language
📰 Newspaper3k	Article text extraction
🍲 BeautifulSoup	HTML parsing for scraper
📄 PyMuPDF	PDF loading

🚀 Getting Started

1. Prerequisites

Python 3.10+
Ollama installed and running

Pull the required models:

ollama pull llama3.2
ollama pull nomic-embed-text

2. Clone the repo

git clone https://github.com/gail-mar/localmind.git
cd localmind

3. Create a virtual environment

python -m venv venv

# Mac/Linux
source venv/bin/activate

# Windows
venv\Scripts\activate

4. Install dependencies

pip install -r requirements.txt

5. Scrape the latest AI articles

python scraper.py

6. Ingest documents into ChromaDB

python ingest.py

7. Launch the app

streamlit run app.py

Open your browser at http://localhost:8501 and start chatting! 🎉

📰 News Sources

The scraper currently collects articles from:

Language	Source
🇬🇧 English	TechCrunch AI, MIT Technology Review
🇪🇸 Spanish	Xataka Inteligencia Artificial
🇳🇱 Dutch	Techzine AI

Adding a new source

Open scraper.py and add a URL to the relevant language list:

sources = {
    "en": [
        "https://techcrunch.com/category/artificial-intelligence/",
        "https://www.technologyreview.com/topic/artificial-intelligence/",
        "https://www.wired.com/tag/artificial-intelligence/",  # ← add here
    ],
    ...
}

💡 Example Questions

What are the latest developments in AI?
What is happening with OpenAI?
What are the latest updates to Apple Intelligence?
How is AI changing the job market?

🔄 Keeping Your Knowledge Base Fresh

To update LocalMind with the latest articles:

python scraper.py   # fetches new articles, skips existing ones
python ingest.py    # adds new content to ChromaDB

Then refresh the Streamlit app and you're up to date!

🛣️ Roadmap

Add a refresh button inside the Streamlit UI
Show sources used to answer each question
Support CSV and Word document ingestion
Add more news sources per language
Conversation memory across sessions

👩‍💻 Author

Gail Marechal — Data Science & AI

Built from scratch as a portfolio project to demonstrate RAG, LangChain, local LLMs and web scraping.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 LocalMind — Chat with Your Documents

📌 Overview

🔁 Pipeline

✨ Features

🗂️ Project Structure

🛠️ Tech Stack

🚀 Getting Started

1. Prerequisites

2. Clone the repo

3. Create a virtual environment

4. Install dependencies

5. Scrape the latest AI articles

6. Ingest documents into ChromaDB

7. Launch the app

📰 News Sources

Adding a new source

💡 Example Questions

🔄 Keeping Your Knowledge Base Fresh

🛣️ Roadmap

👩‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
data		data
.gitignore		.gitignore
README.md		README.md
app.py		app.py
chat.py		chat.py
ingest.py		ingest.py
rag_chain.py		rag_chain.py
requirements.txt		requirements.txt
scraper.py		scraper.py
test_ollama.py		test_ollama.py

Folders and files

Latest commit

History

Repository files navigation

🧠 LocalMind — Chat with Your Documents

📌 Overview

🔁 Pipeline

✨ Features

🗂️ Project Structure

🛠️ Tech Stack

🚀 Getting Started

1. Prerequisites

2. Clone the repo

3. Create a virtual environment

4. Install dependencies

5. Scrape the latest AI articles

6. Ingest documents into ChromaDB

7. Launch the app

📰 News Sources

Adding a new source

💡 Example Questions

🔄 Keeping Your Knowledge Base Fresh

🛣️ Roadmap

👩‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages