A machine learning-based project that predicts health care premium costs based on various user-provided criteria. This project demonstrates data preprocessing, regression modeling, and deployment with a web-based interface using Streamlit.
- Load and process premium data from Excel files using Pandas
- Clean and scale data with MinMaxScaler
- Train prediction models using Scikit-learn regression algorithms
- Achieve up to 90% prediction accuracy
- Build an interactive UI with Streamlit
- Deploy the application on Streamlit Cloud
-
Data Ingestion
- Read past health insurance premium data from Excel files
- Explore and preprocess the data using Pandas and NumPy
-
Data Cleaning & Scaling
- Handle missing or inconsistent values
- Apply MinMaxScaler to normalize feature values
-
Model Training
- Use regression models from Scikit-learn (e.g., Linear Regression, XGBoost)
- Evaluate performance and select the best model
-
Prediction Interface
- Develop a user-friendly web interface using Streamlit
- Accept real-time inputs (e.g., age, gender, health metrics)
- Display predicted premium output
-
Deployment
- Deploy the app to Streamlit Cloud for public accessibility
- Python 3
- Jupyter Notebook
- Pandas, NumPy
- Scikit-learn
- Streamlit