⚠️ Disclaimer

This project is created for educational purposes only.
It is not a medical diagnostic tool and should not be used for real medical decisions.

Live Demo

Streamlit App: https://medical-report-rag.streamlit.app/ The explanations generated by the AI are simplified interpretations of medical report text and may not always be accurate. Always consult a qualified healthcare professional for medical advice, diagnosis, or treatment.

Patient-Centric Medical Report Understanding Assistant

Overview

Medical reports often contain technical terminology, abbreviations, and numerical values that can be difficult for patients to understand. Many people rely on internet searches to interpret these reports, but the information they find is usually generic and not specific to their report.

This project builds a Retrieval-Augmented Generation (RAG) system that allows users to upload a medical report and ask questions about it in simple language. The system retrieves relevant sections from the uploaded report and uses a Large Language Model (LLM) to generate clear explanations.

The goal is to help users better understand their reports while emphasizing that AI should assist, not replace, professional medical advice.

Problem Statement

Medical reports contain complex terms, reference ranges, and structured tables that many patients struggle to interpret. This lack of understanding can cause confusion and anxiety.

This project addresses the problem by creating an AI assistant that:

Accepts a medical report PDF
Retrieves relevant information from the report
Generates simple explanations in natural language

The system uses semantic search and retrieval-augmented generation (RAG) to ensure responses are grounded in the uploaded report.

Target Users

Patients receiving diagnostic reports
Individuals without medical background
Elderly patients needing simplified explanations
Anyone wanting a quick understanding of medical reports

System Architecture

User Uploads Medical Report │ ▼ PDF Text Extraction (pdfplumber) │ ▼ Text Chunking │ ▼ Embedding Generation (SentenceTransformers) │ ▼ Vector Database (ChromaDB) │ ▼ Semantic Retrieval │ ▼ Groq LLM (Llama-3.1) │ ▼ Simple Patient Explanation

Embedding & Retrieval Approach

Embeddings

The report text is divided into smaller chunks and converted into numerical vectors using:

SentenceTransformer: all-MiniLM-L6-v2

Embeddings allow the system to measure semantic similarity between the user question and report text.

Vector Database

Embeddings are stored in ChromaDB, which enables fast similarity search.

When a user asks a question:

The question is converted into an embedding.
ChromaDB compares it with stored report embeddings.
The most relevant chunks are retrieved.

Retrieval-Augmented Generation (RAG)

Instead of asking the AI model directly, the system first retrieves relevant report sections and sends them as context to the language model.

User Question │ ▼ Convert Question → Embedding │ ▼ Search Vector Database │ ▼ Retrieve Relevant Report Chunks │ ▼ Send Context + Question to LLM │ ▼ Generate Explanation

This ensures answers are grounded in actual report content.

Prompt Design

The system uses a structured prompt:

You are a medical report assistant.

Using the report context below, answer the question in simple language so a patient can understand.

Report Context: {retrieved_text}

Question: {user_question}

This ensures responses are:

Easy to understand
Based on the report
Patient-friendly

Technologies Used

Component	Technology
Interface	Streamlit
PDF Processing	pdfplumber
Embeddings	SentenceTransformers
Vector Database	ChromaDB
LLM	Groq (Llama-3.1)
Programming Language	Python

Installation

Clone the repository:

git clone https://github.com/yourusername/medical-report-rag.git
cd medical-report-rag

Install dependencies:

pip install -r requirements.txt

Create a .env file and add your Groq API key:

GROQ_API_KEY=your_api_key_here

Run the application:

python -m streamlit run streamlit_app.py
Example Usage

Upload a medical report PDF

Ask questions like:

What does hemoglobin mean?
Is my HbA1c normal?
Explain the lipid profile.

The system retrieves relevant sections and generates an explanation.

Failure Case

One limitation occurs when the uploaded report contains complex tables or scanned images. In such cases:

PDF text extraction may break table structures

Chunking may separate related information

Retrieval may return incomplete context

Improving PDF parsing or using structured medical report formats can improve accuracy.

Reflection

This project demonstrates how Retrieval-Augmented Generation (RAG) improves the reliability of AI systems by grounding responses in retrieved documents.

Instead of relying only on the language model's internal knowledge, the system retrieves relevant report sections and uses them as context for generating explanations.

Through this project, concepts such as:

embeddings

semantic search

vector databases

prompt engineering

LLM integration

were implemented in a practical real-world scenario.

Future Improvements

Detect abnormal lab values automatically

Highlight important test results

Support multiple report uploads

Improve table extraction from PDFs

Add medical knowledge base integration

License

This project is intended for educational purposes only and is not intended for clinical use.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.chroma_db		.chroma_db
.data		.data
.vscode		.vscode
chroma_db		chroma_db
data		data
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
run.bat		run.bat
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚠️ Disclaimer

Live Demo

Patient-Centric Medical Report Understanding Assistant

Overview

Problem Statement

Target Users

System Architecture

Embedding & Retrieval Approach

Embeddings

Vector Database

Retrieval-Augmented Generation (RAG)

Prompt Design

Technologies Used

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚠️ Disclaimer

Live Demo

Patient-Centric Medical Report Understanding Assistant

Overview

Problem Statement

Target Users

System Architecture

Embedding & Retrieval Approach

Embeddings

Vector Database

Retrieval-Augmented Generation (RAG)

Prompt Design

Technologies Used

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages