Heart disease prediction using ensemble of k-nearest neighbour, random forest and logistic regression method

Coronary heart disease has been ranked as the number one leading cause of death in Malaysia. Based on the recent data published by WHO in 2018, death caused by this disease has reached 34,766 which brought up to 24.69 of the total deaths and places the Malaysian population 64th in the world. Medical...

Full description

Bibliographic Details
Main Authors: Mohd Syafiq Asyraf, Suhaimi, Nor Azuana, Ramli, Noryanti, Muhammad
Format: Article
Language:English
English
Published: AIP Publishing 2024
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/41052/1/syafiqicoaims.pdf
http://umpir.ump.edu.my/id/eprint/41052/7/Heart%20disease%20prediction%20using%20ensemble%20of%20k-nearest_ABST.pdf
_version_ 1811138349621051392
author Mohd Syafiq Asyraf, Suhaimi
Nor Azuana, Ramli
Noryanti, Muhammad
author_facet Mohd Syafiq Asyraf, Suhaimi
Nor Azuana, Ramli
Noryanti, Muhammad
author_sort Mohd Syafiq Asyraf, Suhaimi
collection UMP
description Coronary heart disease has been ranked as the number one leading cause of death in Malaysia. Based on the recent data published by WHO in 2018, death caused by this disease has reached 34,766 which brought up to 24.69 of the total deaths and places the Malaysian population 64th in the world. Medical researchers all around the world believe that there are multiple circumstances for this disease which include health problems, unhealthy personal habits, genetics, and family history. It is not an easy task to predict heart disease since the study needs a broad range of expertise from many disciplines. Recently, machine learning had been applied as one of the methods to predict heart disease. To test the accuracy of different machine learning methods, this study is conducted by applying the data extracted from the machine learning repository. The proposed predictive modelling in this study was developed using the ensemble method. The ensemble technique used was stacking where logistic regression was used as the meta-level classifier while Random Forest and k-nearest neighbour method were applied as the meta-level classifiers. Results obtained from this study show that the proposed method outperforms other single methods with 82.42 accuracies. Although the accuracy and RMSE of the ensemble method are similar to Random Forest, the proposed method is still the best method since it has a 0.903 value for the area under the ROC and 0.843 value for F1 score. This proposed predictive model will be applied by using smartwatch datasets for future study.
first_indexed 2024-09-25T03:48:47Z
format Article
id UMPir41052
institution Universiti Malaysia Pahang
language English
English
last_indexed 2024-09-25T03:48:47Z
publishDate 2024
publisher AIP Publishing
record_format dspace
spelling UMPir410522024-04-24T04:20:56Z http://umpir.ump.edu.my/id/eprint/41052/ Heart disease prediction using ensemble of k-nearest neighbour, random forest and logistic regression method Mohd Syafiq Asyraf, Suhaimi Nor Azuana, Ramli Noryanti, Muhammad QA75 Electronic computers. Computer science RC Internal medicine Coronary heart disease has been ranked as the number one leading cause of death in Malaysia. Based on the recent data published by WHO in 2018, death caused by this disease has reached 34,766 which brought up to 24.69 of the total deaths and places the Malaysian population 64th in the world. Medical researchers all around the world believe that there are multiple circumstances for this disease which include health problems, unhealthy personal habits, genetics, and family history. It is not an easy task to predict heart disease since the study needs a broad range of expertise from many disciplines. Recently, machine learning had been applied as one of the methods to predict heart disease. To test the accuracy of different machine learning methods, this study is conducted by applying the data extracted from the machine learning repository. The proposed predictive modelling in this study was developed using the ensemble method. The ensemble technique used was stacking where logistic regression was used as the meta-level classifier while Random Forest and k-nearest neighbour method were applied as the meta-level classifiers. Results obtained from this study show that the proposed method outperforms other single methods with 82.42 accuracies. Although the accuracy and RMSE of the ensemble method are similar to Random Forest, the proposed method is still the best method since it has a 0.903 value for the area under the ROC and 0.843 value for F1 score. This proposed predictive model will be applied by using smartwatch datasets for future study. AIP Publishing 2024-03-07 Article PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/41052/1/syafiqicoaims.pdf pdf en http://umpir.ump.edu.my/id/eprint/41052/7/Heart%20disease%20prediction%20using%20ensemble%20of%20k-nearest_ABST.pdf Mohd Syafiq Asyraf, Suhaimi and Nor Azuana, Ramli and Noryanti, Muhammad (2024) Heart disease prediction using ensemble of k-nearest neighbour, random forest and logistic regression method. AIP Conference Proceedings, 2895 (1). pp. 1-10. ISSN 0094-243X. (Published) https://doi.org/10.1063/5.0192203 https://doi.org/10.1063/5.0192203
spellingShingle QA75 Electronic computers. Computer science
RC Internal medicine
Mohd Syafiq Asyraf, Suhaimi
Nor Azuana, Ramli
Noryanti, Muhammad
Heart disease prediction using ensemble of k-nearest neighbour, random forest and logistic regression method
title Heart disease prediction using ensemble of k-nearest neighbour, random forest and logistic regression method
title_full Heart disease prediction using ensemble of k-nearest neighbour, random forest and logistic regression method
title_fullStr Heart disease prediction using ensemble of k-nearest neighbour, random forest and logistic regression method
title_full_unstemmed Heart disease prediction using ensemble of k-nearest neighbour, random forest and logistic regression method
title_short Heart disease prediction using ensemble of k-nearest neighbour, random forest and logistic regression method
title_sort heart disease prediction using ensemble of k nearest neighbour random forest and logistic regression method
topic QA75 Electronic computers. Computer science
RC Internal medicine
url http://umpir.ump.edu.my/id/eprint/41052/1/syafiqicoaims.pdf
http://umpir.ump.edu.my/id/eprint/41052/7/Heart%20disease%20prediction%20using%20ensemble%20of%20k-nearest_ABST.pdf
work_keys_str_mv AT mohdsyafiqasyrafsuhaimi heartdiseasepredictionusingensembleofknearestneighbourrandomforestandlogisticregressionmethod
AT norazuanaramli heartdiseasepredictionusingensembleofknearestneighbourrandomforestandlogisticregressionmethod
AT noryantimuhammad heartdiseasepredictionusingensembleofknearestneighbourrandomforestandlogisticregressionmethod