Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023
Abstract Background Cardiovascular disorders (CVDs) are the leading cause of death worldwide. Lower- and middle-income countries (LMICs), such as Bangladesh, are also affected by several types of CVDs, such as heart failure and stroke. The leading cause of death in Bangladesh has recently switched f...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2024-04-01
|
Series: | BMC Cardiovascular Disorders |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12872-024-03883-2 |
_version_ | 1797199665389633536 |
---|---|
author | Sorif Hossain Mohammad Kamrul Hasan Mohammad Omar Faruk Nelufa Aktar Riyadh Hossain Kabir Hossain |
author_facet | Sorif Hossain Mohammad Kamrul Hasan Mohammad Omar Faruk Nelufa Aktar Riyadh Hossain Kabir Hossain |
author_sort | Sorif Hossain |
collection | DOAJ |
description | Abstract Background Cardiovascular disorders (CVDs) are the leading cause of death worldwide. Lower- and middle-income countries (LMICs), such as Bangladesh, are also affected by several types of CVDs, such as heart failure and stroke. The leading cause of death in Bangladesh has recently switched from severe infections and parasitic illnesses to CVDs. Materials and methods The study dataset comprised a random sample of 391 CVD patients' medical records collected between August 2022 and April 2023 using simple random sampling. Moreover, 260 data points were collected from individuals with no CVD problems for comparison purposes. Crosstabs and chi-square tests were used to determine the association between CVD and the explanatory variables. Logistic regression, Naïve Bayes classifier, Decision Tree, AdaBoost classifier, Random Forest, Bagging Tree, and Ensemble learning classifiers were used to predict CVD. The performance evaluations encompassed accuracy, sensitivity, specificity, and area under the receiver operator characteristic (AU-ROC) curve. Results Random Forest had the highest precision among the five techniques considered. The precision rates for the mentioned classifiers are as follows: Logistic Regression (93.67%), Naïve Bayes (94.87%), Decision Tree (96.1%), AdaBoost (94.94%), Random Forest (96.15%), and Bagging Tree (94.87%). The Random Forest classifier maintains the highest balance between correct and incorrect predictions. With 98.04% accuracy, the Random Forest classifier achieved the best precision (96.15%), robust recall (100%), and high F1 score (97.7%). In contrast, the Logistic Regression model achieved the lowest accuracy of 95.42%. Remarkably, the Random Forest classifier achieved the highest AUC value (0.989). Conclusion This research mainly focused on identifying factors that are critical in impacting patients with CVD and predicting CVD risk. It is strongly advised that the Random Forest technique be implemented in a system for predicting cardiac diseases. This research may change clinical practice by providing doctors with a new instrument to determine a patient’s CVD prognosis. |
first_indexed | 2024-04-24T07:19:22Z |
format | Article |
id | doaj.art-3a6f904858424ce789ec1170195738b6 |
institution | Directory Open Access Journal |
issn | 1471-2261 |
language | English |
last_indexed | 2024-04-24T07:19:22Z |
publishDate | 2024-04-01 |
publisher | BMC |
record_format | Article |
series | BMC Cardiovascular Disorders |
spelling | doaj.art-3a6f904858424ce789ec1170195738b62024-04-21T11:08:00ZengBMCBMC Cardiovascular Disorders1471-22612024-04-0124112810.1186/s12872-024-03883-2Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023Sorif Hossain0Mohammad Kamrul Hasan1Mohammad Omar Faruk2Nelufa Aktar3Riyadh Hossain4Kabir Hossain5Department of Statistics, Noakhali Science and Technology UniversityDepartment of Information and Communication Engineering, Noakhali Science and Technology UniversityDepartment of Statistics, Noakhali Science and Technology UniversityDepartment of Statistics, Noakhali Science and Technology UniversityDepartment of Statistics, Noakhali Science and Technology UniversityDepartment of Statistics, Noakhali Science and Technology UniversityAbstract Background Cardiovascular disorders (CVDs) are the leading cause of death worldwide. Lower- and middle-income countries (LMICs), such as Bangladesh, are also affected by several types of CVDs, such as heart failure and stroke. The leading cause of death in Bangladesh has recently switched from severe infections and parasitic illnesses to CVDs. Materials and methods The study dataset comprised a random sample of 391 CVD patients' medical records collected between August 2022 and April 2023 using simple random sampling. Moreover, 260 data points were collected from individuals with no CVD problems for comparison purposes. Crosstabs and chi-square tests were used to determine the association between CVD and the explanatory variables. Logistic regression, Naïve Bayes classifier, Decision Tree, AdaBoost classifier, Random Forest, Bagging Tree, and Ensemble learning classifiers were used to predict CVD. The performance evaluations encompassed accuracy, sensitivity, specificity, and area under the receiver operator characteristic (AU-ROC) curve. Results Random Forest had the highest precision among the five techniques considered. The precision rates for the mentioned classifiers are as follows: Logistic Regression (93.67%), Naïve Bayes (94.87%), Decision Tree (96.1%), AdaBoost (94.94%), Random Forest (96.15%), and Bagging Tree (94.87%). The Random Forest classifier maintains the highest balance between correct and incorrect predictions. With 98.04% accuracy, the Random Forest classifier achieved the best precision (96.15%), robust recall (100%), and high F1 score (97.7%). In contrast, the Logistic Regression model achieved the lowest accuracy of 95.42%. Remarkably, the Random Forest classifier achieved the highest AUC value (0.989). Conclusion This research mainly focused on identifying factors that are critical in impacting patients with CVD and predicting CVD risk. It is strongly advised that the Random Forest technique be implemented in a system for predicting cardiac diseases. This research may change clinical practice by providing doctors with a new instrument to determine a patient’s CVD prognosis.https://doi.org/10.1186/s12872-024-03883-2Cardiovascular diseaseMachine learningRandom forestFeature selectionBangladesh |
spellingShingle | Sorif Hossain Mohammad Kamrul Hasan Mohammad Omar Faruk Nelufa Aktar Riyadh Hossain Kabir Hossain Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023 BMC Cardiovascular Disorders Cardiovascular disease Machine learning Random forest Feature selection Bangladesh |
title | Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023 |
title_full | Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023 |
title_fullStr | Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023 |
title_full_unstemmed | Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023 |
title_short | Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023 |
title_sort | machine learning approach for predicting cardiovascular disease in bangladesh evidence from a cross sectional study in 2023 |
topic | Cardiovascular disease Machine learning Random forest Feature selection Bangladesh |
url | https://doi.org/10.1186/s12872-024-03883-2 |
work_keys_str_mv | AT sorifhossain machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023 AT mohammadkamrulhasan machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023 AT mohammadomarfaruk machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023 AT nelufaaktar machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023 AT riyadhhossain machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023 AT kabirhossain machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023 |