Skip to content

gail-mar/localmind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 LocalMind — Chat with Your Documents

A fully local, privacy-first RAG application that lets you chat with your own documents and the latest AI news — powered by Ollama, LangChain, ChromaDB and Streamlit. No API keys. No data leaves your machine.

Python Streamlit LangChain Ollama ChromaDB License


LocalMind Demo


📌 Overview

LocalMind is a Retrieval Augmented Generation (RAG) application that combines:

  • A web scraper that collects the latest AI news articles in English, Spanish and Dutch
  • A vector database (ChromaDB) that stores and searches through document content
  • A local LLM (Llama 3.2 via Ollama) that answers questions based on your documents
  • A Streamlit chat interface so you can interact with everything in your browser

The key feature: everything runs locally. No OpenAI API key, no data sent to the cloud, no cost per query.


🔁 Pipeline

Web Scraper → .txt files → Document Loader → Chunking → Embeddings → ChromaDB
                                                                          ↓
                                                          User Question → Retriever
                                                                          ↓
                                                               Relevant Chunks → Llama 3.2
                                                                          ↓
                                                                       Answer

✨ Features

  • 🔍 Web scraper — automatically collects AI news articles from multiple sources
  • 📄 Document ingestion — supports PDF and TXT files
  • 🗄️ Local vector store — ChromaDB stores embeddings on disk, no external database needed
  • 🦙 Local LLM — runs Llama 3.2 via Ollama, fully offline
  • 💬 Chat interface — Streamlit UI with conversation history
  • 🔄 Refresh knowledge base — just run the scraper and ingest again to update
  • Easily extensible — add new sources with a single line of code

🗂️ Project Structure

localmind/
│
├── app.py              # Streamlit chat UI
├── ingest.py           # Document loader, chunker and embedder
├── rag_chain.py        # LangChain RAG pipeline
├── scraper.py          # Web scraper for AI news articles
├── data/               # Scraped articles and documents
├── vectorstore/        # ChromaDB embeddings (auto-generated)
├── requirements.txt
└── README.md

🛠️ Tech Stack

Tool Purpose
🦙 Ollama + Llama 3.2 Run LLM locally
🦜 LangChain RAG pipeline and document handling
🗄️ ChromaDB Local vector store for embeddings
🖥️ Streamlit Chat UI in the browser
🐍 Python Core language
📰 Newspaper3k Article text extraction
🍲 BeautifulSoup HTML parsing for scraper
📄 PyMuPDF PDF loading

🚀 Getting Started

1. Prerequisites

  • Python 3.10+
  • Ollama installed and running

Pull the required models:

ollama pull llama3.2
ollama pull nomic-embed-text

2. Clone the repo

git clone https://github.com/gail-mar/localmind.git
cd localmind

3. Create a virtual environment

python -m venv venv

# Mac/Linux
source venv/bin/activate

# Windows
venv\Scripts\activate

4. Install dependencies

pip install -r requirements.txt

5. Scrape the latest AI articles

python scraper.py

6. Ingest documents into ChromaDB

python ingest.py

7. Launch the app

streamlit run app.py

Open your browser at http://localhost:8501 and start chatting! 🎉


📰 News Sources

The scraper currently collects articles from:

Language Source
🇬🇧 English TechCrunch AI, MIT Technology Review
🇪🇸 Spanish Xataka Inteligencia Artificial
🇳🇱 Dutch Techzine AI

Adding a new source

Open scraper.py and add a URL to the relevant language list:

sources = {
    "en": [
        "https://techcrunch.com/category/artificial-intelligence/",
        "https://www.technologyreview.com/topic/artificial-intelligence/",
        "https://www.wired.com/tag/artificial-intelligence/",  # ← add here
    ],
    ...
}

💡 Example Questions

What are the latest developments in AI?
What is happening with OpenAI?
What are the latest updates to Apple Intelligence?
How is AI changing the job market?

🔄 Keeping Your Knowledge Base Fresh

To update LocalMind with the latest articles:

python scraper.py   # fetches new articles, skips existing ones
python ingest.py    # adds new content to ChromaDB

Then refresh the Streamlit app and you're up to date!


🛣️ Roadmap

  • Add a refresh button inside the Streamlit UI
  • Show sources used to answer each question
  • Support CSV and Word document ingestion
  • Add more news sources per language
  • Conversation memory across sessions

👩‍💻 Author

Gail Marechal — Data Science & AI
GitHub


Built from scratch as a portfolio project to demonstrate RAG, LangChain, local LLMs and web scraping.

About

A fully local RAG app — chat with your documents using Ollama + LangChain. No API keys.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages