GitHub - astro05/Cardio_Disease_Detection: Based on UCI dataset and Cardiovascular dataset this project observed different machine learning models and deep learning models. After applying various data preprocessing approaches model performance was improved on average 10% from previous works.

Heart Disease Prediction using Machine Learning & Deep Neural Network Models

Introduction

Cardiac disorders also known narrowly as “heart diseases” are the cause of most deaths worldwide. Heart disease has become a cause of increasing concern for this country with patients enduring several sorts of related illnesses. Death is inevitable if some of the related diseases are diagnosed too late.

In our project, we will try to generate a predictive model of heart diseases which will be used for early detections. Our focus is to find the pre-processing techniques best for specific models, improving the existing models, creating combined predictions from two or more datasets.Our Focus is to implement the model and increase the accuracy of the model done previously.

Methodology

Fig: Proposed Methodology for this work

Model Selection

We have selected Artificial Neural Network(ANN), Convolutional Neural Network(CNN) and Long Short Term Memory (LSTM) for training our data based on paper study. Our Focus is to implement the model and increase the accuracy of the model done previously.

Model 4 (ANN)

Input Data Shape : 13. Input Data is scaled. No. hidden layer : 1. Neurons: 300

Activation:

Hidden Layer: Relu Output Layer: Sigmoid

Loss: Binary Cross Entropy with LogitLoss

Model 5 (ANN)

Input Data Shape : 13. Input Data is scaled. No. hidden layer : 1. Neurons: 300

Activation:

Hidden Layer: Relu Output Layer: Sigmoid

Loss: Binary Cross Entropy with LogitLoss

Output Shape: 1

Experiments

This dataset was taken from the UCI machine learning repository. The heart disease dataset is made up of 75 raw features from which 13 features were published. These features are very vital in the diagnosis of heart diseases. The 13 features considered in this research work are stated

below :

Dataset Collection

UCI Dataset:

SI	Attributes	Description
1.	Age	age in years
2.	Sex	1 = male; 0 = female
3.	cp	chest pain** type (4 values)
4.	trestbps	resting blood pressure (in mm Hg on admission to the hospital)
5.	chol	serum cholesterol in mg/dl
6.	fbs	(fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7.	restecg	resting electrocardiographic results
8.	Thalach	maximum heart rate achieved
9.	Exang	exercise induced angina (1 = yes; 0 = no)
10.	Oldpeak	ST depression induced by exercise relative to rest
11.	Slope	Heart rate slope
12.	Ca	Count of major vessels (value 0-3) coloured by fluoroscopy.
13.	Thal	Thal: 3= normal; 6 = fixed defect; 7 = reversible defect.

The data set had 13 features and 303 rows. No NULL values or duplicate values found in the dataset. Dataset contains 164 (54.3%) heart disease (target = 1) patients and 138 (45.7%) non heart disease (target = 0) patients. Fig.1 represents a balanced dataset. Among these 31.79 % are Female Patients and 68.21 % are Male Patients. The average age of patients is 53. Fig.2 shows the patients affected in cardiac disease at different ranges of age. From Fig.3 visualized that the affected rate of male patients is higher than the rate of female patients.

Fig.1: Heart disease(1) and Non heart disease(0)

Fig.2: Age vs Cardio Disease.

Fig.3: Affected patient based on sex.

Cardiovascular_Dataset: ***

SI	Attributes	Description
1.	age	Objective Feature. int (days)
2.	height	Objective Feature. int (cm)
3.	weight	Objective Feature. float (kg)
4.	Gender	Objective Feature. Categorical code(1 - women, 2 - men)
5.	ap_hi	Systolic blood pressure. Examination Feature. int
6.	ap_lo	Diastolic blood pressure. Examination Feature. int
7.	cholesterol	Cholesterol. Examination Feature(1: normal, 2: above normal, 3: well above normal)
8.	gluc	Glucose. Examination Feature ( 1: normal, 2: above normal, 3: well above normal)
9.	smoke	Smoking. Subjective Feature. binary
10.	alco	Alcohol intake. Subjective Feature. binary
11.	active	Physical activity. Subjective Feature. binary

The data set had 11 features and 70000 rows. There are 3 types of input features:

Objective: factual information.

Examination: results of medical examination.

Subjective: information given by the patient.

No NULL values or duplicate values found in the dataset. Dataset contains 34979 (49.97%) heart disease (cardio = 1) patients and 35021(50.03 %) non heart disease (cardio = 0) patients. Fig.5 represents a balanced dataset. Among these 34.96% are Female Patients and 65.04% are Male Patients. The average age of patients is 55. Fig.6 shows the patients affected in cardiac disease at different ranges of age. Fig.7 visualized that the affected rate of male patients is higher than the rate of female patients.

Fig.5: Heart disease(1) and Non heart disease(0)

Fig.6: Age vs Cardio Disease.

Fig.7: Affected patient based on sex.

Data Pre-Processing

To increase the performance and stability need to pre processing the data. The SelectKBest method selects the features according to the k highest score. Applying fclassif chol (2.002%),fbs (0.2160%), trestbps (6.55%) have low scores and drop these features. Now it has 10 features and 210 rows. KNN

Fig.4: Feature Score (Least Important Selected)

Performance Metrics

For Performance Metrics we have taken Accuracy , Precision , Recall, F1 Score for evaluation of models.

The paper we have chosen to improve firstly was Neural network diagnosis of heart disease (2015). Our expected result is 85% accuracy after implementing the mentioned structure. As we successfully improved the model given and increased the accuracy of the selected paper we tried to select two more papers with better performance metrics. As we know, only accuracy can not be a good performance metric for heart disease prediction.

Our Result:

Logistic Regression(Best Result): Accuracy 91.20%

Decision Tree : Accuracy 84.62%

KNN : Accuracy 79.00%

SVM: Accuracy 86.81%

Random Forest: Accuracy 86.81%

Perceptron : Accuracy 83.54%

Gradient Boosting: Accuracy 86.81%

Confusion Matrix *

*

	Predicted Positive	Predicted Negative
Actual Positive	35	0
Actual Negative	3	23

Result Comparison from Previous Studies

Paper	Model	Accuracy	Precision	Recall	F1-Score
Olaniyi, E. O., Oyedotun, O. K., Helwan, A., & Adnan, K. (2015). Neural network diagnosis of heart disease.	Decision Tree Naive Bayes	45.67% 84.35% 82.31%
Tasnim, F., & Habiba, S. U. (2021). A Comparative Study on Heart Disease Prediction Using Data Mining Techniques and Feature Selection. 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques	KNN SVM Logistic Regression Gradient Boosting Random Forest	88% 82% 80% 83% 91.17%	87% 83% 82% 82% 91%	86% 81% 81% 80% 90%

Terrada, O., Cherradi, B., Hamida, S., Raihani, A., Moujahid, H., & Bouattane, O. (2020). Prediction of Patients with Heart Disease using Artificial Neural Network and Adaptive Boosting techniques. 2020	AdaBoost	72.22%	69.57%	66.67%	68.09%

Table: Previous Result

UCI Data Set Result

Model Name

Preprocessing

Result

Accuracy

Logistic Regression

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

87.91%

90.11%

85.71%

90.11%

Decision Tree

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

71.43%

75.82%

79.12%

80.22%

70.33%

KNN

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

68.13%

84.62%

68.13%

84.62%

SVM

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

86.81%

87.91%

86.81%

Random Forest

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

86.81%

83.52%

84.62%

83.52%

87.91%

Perceptron

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

67.03%

76.92%

57.14%

68.13%

83.52%

Gradient Boosting

All Feature

Scaling

FClassif

Chi-square

One Hot Encoding

86.81%

85.71%

82.42%

84.62%

Cardiovascular Data Set Result

Model Name

Preprocessing

Result

Accuracy

Logistic Regression

All Feature

Scaling

FClassif

One Hot Encoding

69.42%

49.89%

71.86%

49.89%

Decision Tree

All Feature

Scaling

FClassif

One Hot Encoding

63.31%

63.19%

63.50%

63.22%

KNN

All Feature

Scaling

FClassif

One Hot Encoding

63.79%

50.51%

68.93%

50.38%

SVM

All Feature

Scaling

FClassif

One Hot Encoding

71.66%

64.40%

72.30%

63.70%

Random Forest

All Feature

Scaling

FClassif

One Hot Encoding

71.96%

71.98%

70.48%

71.74%

Table: Our Result

Fig. UCI Dataset Result Comparison

Fig. Cardio_vascular Dataset Result Comparison

Performance Metrics

For Performance Metrics we have taken Accuracy , Precision , Recall, F1 Score for evaluation of models.

The paper we have chosen to improve firstly was Neural network diagnosis of heart disease (2015). Our expected result is 85% accuracy after implementing the mentioned structure. As we successfully improved the model given and increased the accuracy of the selected paper we tried to select two more papers with better performance metrics. As we know, only accuracy can not be a good performance metric for heart disease prediction.

Our Result:

ANN(Paper Structure) : Accuracy 85.71%

ANN(Model 4) : Accuracy 91.08%

ANN(Best Result) : Accuracy 95.08%

Confusion Matrix *

*

	Predicted Positive	Predicted Negative
Actual Positive	35	0
Actual Negative	3	23

Result Comparison

Paper	Model	Accuracy	Precision	Recall	F1-Score	No Of Hidden Lair
Olaniyi, E. O., Oyedotun, O. K., Helwan, A., & Adnan, K. (2015). Neural network diagnosis of heart disease.	ANN	85%				6
Lin, C.-H., Yang, P.-K., Lin, Y.-C., & Fu, P.-K. (2020). On Machine Learning Models for Heart Disease Diagnosis. 2020 IEEE 2nd Eurasia	ANN	91.26%				1
	CNN	83.50%				3
Terrada, O., Cherradi, B., Hamida, S., Raihani, A., Moujahid, H., & Bouattane, O. (2020). Prediction of Patients with Heart Disease using Artificial Neural Network and Adaptive Boosting techniques. 2020	ANN	91.41%	79.67%	70.36%	75.98%	3

Table: Previous Result

Evaluation

Model Name	Preprocessing	No of Input Layer	No of Hidden Layer	Neurons/Filters	Activation Function & Loss Function	Optimizer & Learning Rate	Epoch	Result
								Accuracy
ANN (Paper Selected)	Scaling	13	6	5	Sigmoid MSE	Adam 0.0032	2000	85.71%
ANN (Proposed Model-1)	Scaling	13	3	12	Relu BCE	Adam 0.01	200	87.91%
LSTM (Model-2)	Scaling	13	4	100	Relu BCE	Adam 0.001	90	77.04%
1D CNN (Proposed Model-3)	Scaling	13	2	128	Relu BCE	Adam 0.01	15	86.81%
ANN (Proposed Model-4)	Feature Selection with Scaling	10	3	100	Relu BCE	Adam 0.01	125	91.21%
ANN (Proposed Model-5) Best Result	Scaling	13	1	300	Relu BCE with LogitLoss	SGD 0.01	80	96.72%

Table: Our Result

Discussion

Our Experiment yielded good results.We have successfully obtained better results than above mentioned papers.We have found that generally scaling and encoding performs well for KNN,Logistic Regression. In future performing more hyperparameter tuning may increase our result. Other Boosting algorithms can be used to increase the accuracy of the models.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
img		img
Cardio-Heart-Dataset.csv		Cardio-Heart-Dataset.csv
Project_Heart_Disease_Classification(Cardiovascular_Dataset).ipynb		Project_Heart_Disease_Classification(Cardiovascular_Dataset).ipynb
Project_Heart_Disease_Classification(UCI_Dataset) with NN.ipynb		Project_Heart_Disease_Classification(UCI_Dataset) with NN.ipynb
Project_Heart_Disease_Classification(UCI_Dataset).ipynb		Project_Heart_Disease_Classification(UCI_Dataset).ipynb
README.md		README.md
UCI-Heart-Dataset.csv		UCI-Heart-Dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages