Skip to content

astro05/Cardio_Disease_Detection

Repository files navigation

Heart Disease Prediction using Machine Learning & Deep Neural Network Models

  1. Introduction

Cardiac disorders also known narrowly as “heart diseases” are the cause of most deaths worldwide. Heart disease has become a cause of increasing concern for this country with patients enduring several sorts of related illnesses. Death is inevitable if some of the related diseases are diagnosed too late.

In our project, we will try to generate a predictive model of heart diseases which will be used for early detections. Our focus is to find the pre-processing techniques best for specific models, improving the existing models, creating combined predictions from two or more datasets.Our Focus is to implement the model and increase the accuracy of the model done previously.

  1. Methodology

Fig: Proposed Methodology for this work

  1. Model Selection

We have selected Artificial Neural Network(ANN), Convolutional Neural Network(CNN) and Long Short Term Memory (LSTM) for training our data based on paper study. Our Focus is to implement the model and increase the accuracy of the model done previously.

Model 4 (ANN)

Input Data Shape : 13. Input Data is scaled. No. hidden layer : 1. Neurons: 300

Activation:

Hidden Layer: Relu Output Layer: Sigmoid

Loss: Binary Cross Entropy with LogitLoss

Model 5 (ANN)

Input Data Shape : 13. Input Data is scaled. No. hidden layer : 1. Neurons: 300

Activation:

Hidden Layer: Relu Output Layer: Sigmoid

Loss: Binary Cross Entropy with LogitLoss

Output Shape: 1

  1. Experiments

This dataset was taken from the UCI machine learning repository. The heart disease dataset is made up of 75 raw features from which 13 features were published. These features are very vital in the diagnosis of heart diseases. The 13 features considered in this research work are stated

below :

  1. Dataset Collection

UCI Dataset:

SI Attributes Description
1. Age age in years
2. Sex 1 = male; 0 = female
3. cp chest pain** type (4 values)
4. trestbps resting blood pressure (in mm Hg on admission to the hospital)
5. chol serum cholesterol in mg/dl
6. fbs (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg resting electrocardiographic results
8. Thalach maximum heart rate achieved
9. Exang exercise induced angina (1 = yes; 0 = no)
10. Oldpeak ST depression induced by exercise relative to rest
11. Slope Heart rate slope
12. Ca Count of major vessels (value 0-3) coloured by fluoroscopy.
13. Thal

Thal: 3= normal; 6 = fixed defect; 7 = reversible defect.

The data set had 13 features and 303 rows. No NULL values or duplicate values found in the dataset. Dataset contains 164 (54.3%) heart disease (target = 1) patients and 138 (45.7%) non heart disease (target = 0) patients. Fig.1 represents a balanced dataset. Among these 31.79 % are Female Patients and 68.21 % are Male Patients. The average age of patients is 53. Fig.2 shows the patients affected in cardiac disease at different ranges of age. From Fig.3 visualized that the affected rate of male patients is higher than the rate of female patients.

Fig.1: Heart disease(1) and Non heart disease(0)

Fig.2: Age vs Cardio Disease.

Fig.3: Affected patient based on sex.

Cardiovascular_Dataset: ***

SI Attributes Description
1. age Objective Feature. int (days)
2. height Objective Feature. int (cm)
3. weight Objective Feature. float (kg)
4. Gender Objective Feature. Categorical code(1 - women, 2 - men)
5. ap_hi Systolic blood pressure. Examination Feature. int
6. ap_lo Diastolic blood pressure. Examination Feature. int
7. cholesterol Cholesterol. Examination Feature(1: normal, 2: above normal, 3: well above normal)
8. gluc Glucose. Examination Feature ( 1: normal, 2: above normal, 3: well above normal)
9. smoke Smoking. Subjective Feature. binary
10. alco Alcohol intake. Subjective Feature. binary
11. active Physical activity. Subjective Feature. binary

The data set had 11 features and 70000 rows. There are 3 types of input features:

Objective: factual information.

Examination: results of medical examination.

Subjective: information given by the patient.

No NULL values or duplicate values found in the dataset. Dataset contains 34979 (49.97%) heart disease (cardio = 1) patients and 35021(50.03 %) non heart disease (cardio = 0) patients. Fig.5 represents a balanced dataset. Among these 34.96% are Female Patients and 65.04% are Male Patients. The average age of patients is 55. Fig.6 shows the patients affected in cardiac disease at different ranges of age. Fig.7 visualized that the affected rate of male patients is higher than the rate of female patients.

Fig.5: Heart disease(1) and Non heart disease(0)

Fig.6: Age vs Cardio Disease.

Fig.7: Affected patient based on sex.

  1. Data Pre-Processing

To increase the performance and stability need to pre processing the data. The SelectKBest method selects the features according to the k highest score. Applying fclassif chol (2.002%),fbs (0.2160%), trestbps (6.55%) have low scores and drop these features. Now it has 10 features and 210 rows. KNN

Fig.4: Feature Score (Least Important Selected)

  1. Performance Metrics

    For Performance Metrics we have taken Accuracy , Precision , Recall, F1 Score for evaluation of models.

    The paper we have chosen to improve firstly was Neural network diagnosis of heart disease (2015). Our expected result is 85% accuracy after implementing the mentioned structure. As we successfully improved the model given and increased the accuracy of the selected paper we tried to select two more papers with better performance metrics. As we know, only accuracy can not be a good performance metric for heart disease prediction.

Our Result:

Logistic Regression(Best Result): Accuracy 91.20%

Decision Tree : Accuracy 84.62%

KNN : Accuracy 79.00%

SVM: Accuracy 86.81%

Random Forest: Accuracy 86.81%

Perceptron : Accuracy 83.54%

Gradient Boosting: Accuracy 86.81%

Confusion Matrix *

*

Predicted Positive Predicted Negative
Actual Positive 35 0
Actual Negative 3 23

Result Comparison from Previous Studies

Paper Model Accuracy Precision Recall F1-Score
Olaniyi, E. O., Oyedotun, O. K., Helwan, A., & Adnan, K. (2015). Neural network diagnosis of heart disease.

Decision Tree

Naive Bayes

45.67%

84.35%

82.31%

Tasnim, F., & Habiba, S. U. (2021). A Comparative Study on Heart Disease Prediction Using Data Mining Techniques and Feature Selection. 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques

KNN

SVM

Logistic Regression

Gradient Boosting

Random Forest

88%

82%

80%

83%

91.17%

87%

83%

82%

82%

91%

86%

81%

81%

80%

90%

Terrada, O., Cherradi, B., Hamida, S., Raihani, A., Moujahid, H., & Bouattane, O. (2020). Prediction of Patients with Heart Disease using Artificial Neural Network and Adaptive Boosting techniques. 2020

AdaBoost

72.22% 69.57% 66.67% 68.09%

Table: Previous Result

UCI Data Set Result

Model Name

Preprocessing

Result
Accuracy
Logistic Regression

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

87.91%

87.91%

90.11%

85.71%

90.11%

Decision Tree

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

71.43%

75.82%

79.12%

80.22%

70.33%

KNN

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

68.13%

84.62%

68.13%

68.13%

84.62%

SVM

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

86.81%

87.91%

87.91%

86.81%

86.81%

Random Forest

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

86.81%

83.52%

84.62%

83.52%

87.91%

Perceptron

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

67.03%

76.92%

57.14%

68.13%

83.52%

Gradient Boosting

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

86.81%

86.81%

85.71%

82.42%

84.62%

Cardiovascular Data Set Result

Model Name

Preprocessing

Result
Accuracy
Logistic Regression

All Feature

Scaling

FClassif

One Hot Encoding

69.42%

49.89%

71.86%

49.89%

Decision Tree

All Feature

Scaling

FClassif

One Hot Encoding

63.31%

63.19%

63.50%

63.22%

KNN

All Feature

Scaling

FClassif

One Hot Encoding

63.79%

50.51%

68.93%

50.38%

SVM

All Feature

Scaling

FClassif

One Hot Encoding

71.66%

64.40%

72.30%

63.70%

Random Forest

All Feature

Scaling

FClassif

One Hot Encoding

71.96%

71.98%

70.48%

71.74%

Table: Our Result

Fig. UCI Dataset Result Comparison

Fig. Cardio_vascular Dataset Result Comparison

  1. Performance Metrics

    For Performance Metrics we have taken Accuracy , Precision , Recall, F1 Score for evaluation of models.

    The paper we have chosen to improve firstly was Neural network diagnosis of heart disease (2015). Our expected result is 85% accuracy after implementing the mentioned structure. As we successfully improved the model given and increased the accuracy of the selected paper we tried to select two more papers with better performance metrics. As we know, only accuracy can not be a good performance metric for heart disease prediction.

Our Result:

ANN(Paper Structure) : Accuracy 85.71%

ANN(Model 4) : Accuracy 91.08%

ANN(Best Result) : Accuracy 95.08%

Confusion Matrix *

*

Predicted Positive Predicted Negative
Actual Positive 35 0
Actual Negative 3 23

Result Comparison

Paper Model Accuracy Precision Recall F1-Score No Of Hidden Lair
Olaniyi, E. O., Oyedotun, O. K., Helwan, A., & Adnan, K. (2015). Neural network diagnosis of heart disease. ANN 85% 6
Lin, C.-H., Yang, P.-K., Lin, Y.-C., & Fu, P.-K. (2020). On Machine Learning Models for Heart Disease Diagnosis. 2020 IEEE 2nd Eurasia ANN 91.26% 1

CNN

83.50% 3
Terrada, O., Cherradi, B., Hamida, S., Raihani, A., Moujahid, H., & Bouattane, O. (2020). Prediction of Patients with Heart Disease using Artificial Neural Network and Adaptive Boosting techniques. 2020 ANN 91.41% 79.67% 70.36% 75.98% 3

Table: Previous Result

  1. Evaluation

Model Name

Preprocessing

No of Input Layer

No of Hidden Layer

Neurons/Filters

Activation Function &

Loss Function

Optimizer

&

Learning Rate

Epoch Result
Accuracy
ANN (Paper Selected) Scaling 13 6 5

Sigmoid

MSE

Adam

0.0032

2000 85.71%
ANN (Proposed Model-1) Scaling 13 3 12

Relu

BCE

Adam

0.01

200 87.91%
LSTM (Model-2) Scaling 13 4 100

Relu

BCE

Adam

0.001

90 77.04%
1D CNN (Proposed Model-3) Scaling 13 2 128

Relu

BCE

Adam

0.01

15 86.81%
ANN (Proposed Model-4) Feature Selection with Scaling 10 3 100

Relu

BCE

Adam

0.01

125 91.21%

ANN (Proposed Model-5)

Best Result

Scaling 13 1 300

Relu

BCE with LogitLoss

SGD

0.01

80 96.72%

Table: Our Result

  1. Discussion

Our Experiment yielded good results.We have successfully obtained better results than above mentioned papers.We have found that generally scaling and encoding performs well for KNN,Logistic Regression. In future performing more hyperparameter tuning may increase our result. Other Boosting algorithms can be used to increase the accuracy of the models.

About

Based on UCI dataset and Cardiovascular dataset this project observed different machine learning models and deep learning models. After applying various data preprocessing approaches model performance was improved on average 10% from previous works.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors