Skip to content

Astroidkiller/medical-report-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚠️ Disclaimer

This project is created for educational purposes only.
It is not a medical diagnostic tool and should not be used for real medical decisions.

Live Demo

Streamlit App: https://medical-report-rag.streamlit.app/ The explanations generated by the AI are simplified interpretations of medical report text and may not always be accurate. Always consult a qualified healthcare professional for medical advice, diagnosis, or treatment.


Patient-Centric Medical Report Understanding Assistant

Overview

Medical reports often contain technical terminology, abbreviations, and numerical values that can be difficult for patients to understand. Many people rely on internet searches to interpret these reports, but the information they find is usually generic and not specific to their report.

This project builds a Retrieval-Augmented Generation (RAG) system that allows users to upload a medical report and ask questions about it in simple language. The system retrieves relevant sections from the uploaded report and uses a Large Language Model (LLM) to generate clear explanations.

The goal is to help users better understand their reports while emphasizing that AI should assist, not replace, professional medical advice.


Problem Statement

Medical reports contain complex terms, reference ranges, and structured tables that many patients struggle to interpret. This lack of understanding can cause confusion and anxiety.

This project addresses the problem by creating an AI assistant that:

  1. Accepts a medical report PDF
  2. Retrieves relevant information from the report
  3. Generates simple explanations in natural language

The system uses semantic search and retrieval-augmented generation (RAG) to ensure responses are grounded in the uploaded report.


Target Users

  • Patients receiving diagnostic reports
  • Individuals without medical background
  • Elderly patients needing simplified explanations
  • Anyone wanting a quick understanding of medical reports

System Architecture

User Uploads Medical Report │ ▼ PDF Text Extraction (pdfplumber) │ ▼ Text Chunking │ ▼ Embedding Generation (SentenceTransformers) │ ▼ Vector Database (ChromaDB) │ ▼ Semantic Retrieval │ ▼ Groq LLM (Llama-3.1) │ ▼ Simple Patient Explanation


Embedding & Retrieval Approach

Embeddings

The report text is divided into smaller chunks and converted into numerical vectors using:

SentenceTransformer: all-MiniLM-L6-v2

Embeddings allow the system to measure semantic similarity between the user question and report text.


Vector Database

Embeddings are stored in ChromaDB, which enables fast similarity search.

When a user asks a question:

  1. The question is converted into an embedding.
  2. ChromaDB compares it with stored report embeddings.
  3. The most relevant chunks are retrieved.

Retrieval-Augmented Generation (RAG)

Instead of asking the AI model directly, the system first retrieves relevant report sections and sends them as context to the language model.

User Question │ ▼ Convert Question → Embedding │ ▼ Search Vector Database │ ▼ Retrieve Relevant Report Chunks │ ▼ Send Context + Question to LLM │ ▼ Generate Explanation

This ensures answers are grounded in actual report content.


Prompt Design

The system uses a structured prompt:

You are a medical report assistant.

Using the report context below, answer the question in simple language so a patient can understand.

Report Context: {retrieved_text}

Question: {user_question}

This ensures responses are:

  • Easy to understand
  • Based on the report
  • Patient-friendly

Technologies Used

Component Technology
Interface Streamlit
PDF Processing pdfplumber
Embeddings SentenceTransformers
Vector Database ChromaDB
LLM Groq (Llama-3.1)
Programming Language Python

Installation

Clone the repository:

git clone https://github.com/yourusername/medical-report-rag.git
cd medical-report-rag

Install dependencies:

pip install -r requirements.txt

Create a .env file and add your Groq API key:

GROQ_API_KEY=your_api_key_here

Run the application:

python -m streamlit run streamlit_app.py
Example Usage

Upload a medical report PDF

Ask questions like:

What does hemoglobin mean?
Is my HbA1c normal?
Explain the lipid profile.

The system retrieves relevant sections and generates an explanation.

Failure Case

One limitation occurs when the uploaded report contains complex tables or scanned images. In such cases:

PDF text extraction may break table structures

Chunking may separate related information

Retrieval may return incomplete context

Improving PDF parsing or using structured medical report formats can improve accuracy.

Reflection

This project demonstrates how Retrieval-Augmented Generation (RAG) improves the reliability of AI systems by grounding responses in retrieved documents.

Instead of relying only on the language model's internal knowledge, the system retrieves relevant report sections and uses them as context for generating explanations.

Through this project, concepts such as:

embeddings

semantic search

vector databases

prompt engineering

LLM integration

were implemented in a practical real-world scenario.

Future Improvements

Detect abnormal lab values automatically

Highlight important test results

Support multiple report uploads

Improve table extraction from PDFs

Add medical knowledge base integration

License

This project is intended for educational purposes only and is not intended for clinical use.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors