Pulse of Prevention is an exploratory data analysis (EDA) project focused on understanding the factors contributing to heart disease. Using a dataset of 1,025 patients, this analysis identifies key risk factors, explores patient profiles, and provides insights for early interventions to improve patient outcomes.
- Identify key factors contributing to heart disease
- Develop profiles of high-risk patients
- Suggest actionable preventive measures
- Provide insights for healthcare providers to optimize patient care
Source: Heart Disease Dataset - Kaggle
Size: 1,025 entries, 14 features
Features include:
- Demographics: Age, Sex
- Clinical measurements: Blood pressure (trestbps), Cholesterol (chol), Max heart rate (thalach), ST depression (oldpeak)
- Medical history: Fasting blood sugar (fbs), Thalassemia (thal), Exercise-induced angina (exang)
- Diagnostic results: Chest pain type (cp), Resting ECG (restecg), Slope of ST segment (slope), Number of major vessels (ca)
- Outcome: Heart disease presence (target)
-
Demographics & Clinical Insights
- Average age: 54 years
- Male patients: 59%, Female patients: 41%
- Average blood pressure: 132 mmHg
- Average cholesterol: 246 mg/dL
- 15% of patients have high fasting blood sugar (>120 mg/dL)
-
Heart Disease Risk Factors
- Strongest predictors: Chest pain type (cp), Exercise-induced angina (exang), ST segment slope (slope), Max heart rate (thalach)
- Patients with fixed thalassemia defects (thal=2) are at higher risk
- Age, cholesterol, and blood pressure are contributing factors, especially when combined with other risk indicators
-
Age & Risk Trends
- Heart disease prevalence is significant in the ages 40–60
- Early 40s and late 50s are critical windows for preventive interventions
-
Predictive Modeling
- Logistic Regression using all features achieved ~81% accuracy
- Model effectively identifies at-risk patients for early intervention
- Clinical & Business Impact
- Targeted screening for high-risk patients improves early detection
- Personalized interventions based on ECG, thalassemia, and clinical measurements enhance patient outcomes
- Data-driven insights guide resource allocation and preventive healthcare strategies
- Distribution Analysis: Age, gender, chest pain types, thalassemia types
- Correlation Analysis: Identifying key relationships between variables and heart disease presence
- Risk Factor Combinations: Most common profiles in heart disease patients
- Clinical Measurement Comparison: Differences between patients with and without heart disease Visualizations Used: Histograms, Bar Charts, Line Plots, Pair Plots, Heatmaps
This EDA highlights the critical factors influencing heart disease and demonstrates the potential for data-driven preventive healthcare. Combining demographic data, medical history, and clinical measurements enables targeted interventions and improved patient outcomes.
- Include time-to-event data for true survival analysis
- Apply advanced predictive models (Random Forest, XGBoost) for higher accuracy
- Explore lifestyle and environmental factors for a more comprehensive risk assessment
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- Power BI
Our analysis shows that heart disease risk is influenced by multiple factors working together rather than a single measurement. Age, chest pain type, exercise-induced angina, ST segment slope, and maximum heart rate are the strongest indicators of risk.
Patients aged 40–60 and those with fixed thalassemia defects (thal = 2) are particularly vulnerable. Combining these clinical and demographic indicators allows for more accurate identification of high-risk patients.
We recommend focusing on integrated risk assessments, early preventive interventions for mid-life patients, and closer monitoring of patients with high-risk ECG or thalassemia profiles. Leveraging predictive insights can improve early detection and guide personalized care strategies.


