Skip to content

thanusree2630/healthcare-premium-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฅ Health Insurance Cost Predictor

A machine learning-powered web application that predicts health insurance premiums based on individual characteristics and health factors.

Python Streamlit ML

๐Ÿ“‹ Overview

This project uses machine learning algorithms to predict health insurance costs based on various demographic, lifestyle, and health-related factors. The application provides an intuitive interface for users to input their details and receive instant premium predictions.

๐ŸŽฏ Features

  • Interactive UI: Clean, organized 4x3 grid layout with dark theme
  • Real-time Predictions: Instant insurance cost estimates via ML model
  • Comprehensive Input Fields: 12 different health and demographic parameters
  • User-friendly Controls: Mix of number inputs and dropdown selectors
  • Responsive Design: Built with Streamlit for seamless user experience

๐Ÿš€ Demo

Visit the live application: Premium Health Insurance Cost Predictor

๐Ÿ“Š Dataset & Features

Input Features

  • Age: 18-100 years
  • Number of Dependants: 0-20 dependents
  • Income in Lakhs: Annual income (0-200 lakhs)
  • Genetical Risk: Risk score 0-5 based on genetic factors
  • Insurance Plan: Bronze, Silver, or Gold tier
  • Employment Status: Salaried, Self-Employed, Freelancer
  • Gender: Male or Female
  • Marital Status: Married or Unmarried
  • BMI Category: Normal, Overweight, Underweight, Obesity
  • Smoking Status: No Smoking, Regular, Occasional
  • Region: Northeast, Northwest, Southeast, Southwest
  • Medical History:
    • No Disease
    • Diabetes
    • High blood pressure
    • Diabetes & High blood pressure
    • Thyroid
    • Heart disease
    • High blood pressure & Heart disease
    • Diabetes & Thyroid
    • Diabetes & Heart disease

๐Ÿ› ๏ธ Technology Stack

  • Frontend: Streamlit
  • Backend: Python
  • ML Libraries:
    • Scikit-learn (Model Training & Prediction)
    • Pandas (Data Manipulation)
    • NumPy (Numerical Computing)
  • Visualization: Matplotlib, Seaborn (for EDA)
  • Model Persistence: Pickle/Joblib

๐Ÿ“ Project Structure

healthcare-premium-prediction/
โ”‚
โ”œโ”€โ”€ artifacts/                      # Trained models and preprocessors
โ”‚   โ”œโ”€โ”€ model_young.joblib         # ML model for age <= 25
โ”‚   โ”œโ”€โ”€ model_rest.joblib          # ML model for age > 25
โ”‚   โ”œโ”€โ”€ scaler_young.joblib        # Scaler for young age group
โ”‚   โ””โ”€โ”€ scaler_rest.joblib         # Scaler for older age group
โ”‚
โ”œโ”€โ”€ main.py                         # Main Streamlit application
โ”œโ”€โ”€ prediction_helper.py            # Prediction utility functions
โ”œโ”€โ”€ requirements.txt                # Python dependencies
โ”œโ”€โ”€ README.md                       # Project documentation
โ”œโ”€โ”€ LICENSE                         # Apache-2.0 License
โ””โ”€โ”€ .gitignore                      # Git ignore rules

๐Ÿ”ง Installation & Setup

Prerequisites

  • Python 3.10 or higher
  • pip package manager

Local Setup

  1. Clone the repository
git clone https://github.com/thanusree2630/healthcare-premium-prediction.git
cd healthcare-premium-prediction
  1. Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Run the application
streamlit run main.py
  1. Access the app Open your browser and navigate to http://localhost:8501

๐Ÿ“ฆ Requirements

streamlit>=1.28.0
pandas>=2.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
matplotlib>=3.7.0
seaborn>=0.12.0
joblib>=1.3.0

๐Ÿงช Model Development Process

1. Exploratory Data Analysis (EDA)

  • Data distribution analysis
  • Correlation studies
  • Outlier detection
  • Missing value handling
  • Feature importance analysis

2. Feature Engineering

  • Categorical encoding (One-Hot/Label Encoding)
  • Numerical feature scaling
  • Feature interaction creation
  • Dimensionality reduction (if applicable)

3. Model Training

  • Algorithm selection (Linear Regression, Random Forest, XGBoost, etc.)
  • Cross-validation
  • Hyperparameter tuning
  • Model evaluation metrics (RMSE, MAE, Rยฒ)
  • Model persistence

4. Model Performance Metrics

  • Rยฒ Score
  • RMSE
  • MAE

๐Ÿ“ˆ Usage

  1. Open the application in your browser

  2. Fill in all required fields in the 4x3 grid layout:

    Row 1: Age, Number of Dependants, Income in Lakhs

    Row 2: Genetical Risk, Insurance Plan, Employment Status

    Row 3: Gender, Marital Status, BMI Category

    Row 4: Smoking Status, Region, Medical History

  3. Click the "Predict" button

  4. View your predicted insurance cost displayed as a success message

๐Ÿค Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/improvement)
  3. Make your changes
  4. Commit your changes (git commit -am 'Add new feature')
  5. Push to the branch (git push origin feature/improvement)
  6. Create a Pull Request

๐Ÿ“ License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.

๐Ÿ‘ค Author

Thanusree

๐Ÿ“ž Support

If you encounter any issues or have questions, please:

  • Open an issue on GitHub
  • Contact me via email

๐Ÿ”ฎ Future Enhancements

  • Add more ML algorithms comparison
  • Implement SHAP values for model interpretability
  • Add data visualization dashboard
  • Include policy recommendation system
  • Multi-language support
  • Mobile-responsive design improvements
  • API endpoint for programmatic access

โญ If you find this project helpful, please consider giving it a star!

About

Healthcare Insurance Premium Prediction A machine learning model that predicts health insurance premiums based on factors like age, BMI, smoking status, and medical history using regression algorithms and Python.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages