Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023

Abstract Background Cardiovascular disorders (CVDs) are the leading cause of death worldwide. Lower- and middle-income countries (LMICs), such as Bangladesh, are also affected by several types of CVDs, such as heart failure and stroke. The leading cause of death in Bangladesh has recently switched f...

Full description

Bibliographic Details
Main Authors: Sorif Hossain, Mohammad Kamrul Hasan, Mohammad Omar Faruk, Nelufa Aktar, Riyadh Hossain, Kabir Hossain
Format: Article
Language:English
Published: BMC 2024-04-01
Series:BMC Cardiovascular Disorders
Subjects:
Online Access:https://doi.org/10.1186/s12872-024-03883-2
_version_ 1797199665389633536
author Sorif Hossain
Mohammad Kamrul Hasan
Mohammad Omar Faruk
Nelufa Aktar
Riyadh Hossain
Kabir Hossain
author_facet Sorif Hossain
Mohammad Kamrul Hasan
Mohammad Omar Faruk
Nelufa Aktar
Riyadh Hossain
Kabir Hossain
author_sort Sorif Hossain
collection DOAJ
description Abstract Background Cardiovascular disorders (CVDs) are the leading cause of death worldwide. Lower- and middle-income countries (LMICs), such as Bangladesh, are also affected by several types of CVDs, such as heart failure and stroke. The leading cause of death in Bangladesh has recently switched from severe infections and parasitic illnesses to CVDs. Materials and methods The study dataset comprised a random sample of 391 CVD patients' medical records collected between August 2022 and April 2023 using simple random sampling. Moreover, 260 data points were collected from individuals with no CVD problems for comparison purposes. Crosstabs and chi-square tests were used to determine the association between CVD and the explanatory variables. Logistic regression, Naïve Bayes classifier, Decision Tree, AdaBoost classifier, Random Forest, Bagging Tree, and Ensemble learning classifiers were used to predict CVD. The performance evaluations encompassed accuracy, sensitivity, specificity, and area under the receiver operator characteristic (AU-ROC) curve. Results Random Forest had the highest precision among the five techniques considered. The precision rates for the mentioned classifiers are as follows: Logistic Regression (93.67%), Naïve Bayes (94.87%), Decision Tree (96.1%), AdaBoost (94.94%), Random Forest (96.15%), and Bagging Tree (94.87%). The Random Forest classifier maintains the highest balance between correct and incorrect predictions. With 98.04% accuracy, the Random Forest classifier achieved the best precision (96.15%), robust recall (100%), and high F1 score (97.7%). In contrast, the Logistic Regression model achieved the lowest accuracy of 95.42%. Remarkably, the Random Forest classifier achieved the highest AUC value (0.989). Conclusion This research mainly focused on identifying factors that are critical in impacting patients with CVD and predicting CVD risk. It is strongly advised that the Random Forest technique be implemented in a system for predicting cardiac diseases. This research may change clinical practice by providing doctors with a new instrument to determine a patient’s CVD prognosis.
first_indexed 2024-04-24T07:19:22Z
format Article
id doaj.art-3a6f904858424ce789ec1170195738b6
institution Directory Open Access Journal
issn 1471-2261
language English
last_indexed 2024-04-24T07:19:22Z
publishDate 2024-04-01
publisher BMC
record_format Article
series BMC Cardiovascular Disorders
spelling doaj.art-3a6f904858424ce789ec1170195738b62024-04-21T11:08:00ZengBMCBMC Cardiovascular Disorders1471-22612024-04-0124112810.1186/s12872-024-03883-2Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023Sorif Hossain0Mohammad Kamrul Hasan1Mohammad Omar Faruk2Nelufa Aktar3Riyadh Hossain4Kabir Hossain5Department of Statistics, Noakhali Science and Technology UniversityDepartment of Information and Communication Engineering, Noakhali Science and Technology UniversityDepartment of Statistics, Noakhali Science and Technology UniversityDepartment of Statistics, Noakhali Science and Technology UniversityDepartment of Statistics, Noakhali Science and Technology UniversityDepartment of Statistics, Noakhali Science and Technology UniversityAbstract Background Cardiovascular disorders (CVDs) are the leading cause of death worldwide. Lower- and middle-income countries (LMICs), such as Bangladesh, are also affected by several types of CVDs, such as heart failure and stroke. The leading cause of death in Bangladesh has recently switched from severe infections and parasitic illnesses to CVDs. Materials and methods The study dataset comprised a random sample of 391 CVD patients' medical records collected between August 2022 and April 2023 using simple random sampling. Moreover, 260 data points were collected from individuals with no CVD problems for comparison purposes. Crosstabs and chi-square tests were used to determine the association between CVD and the explanatory variables. Logistic regression, Naïve Bayes classifier, Decision Tree, AdaBoost classifier, Random Forest, Bagging Tree, and Ensemble learning classifiers were used to predict CVD. The performance evaluations encompassed accuracy, sensitivity, specificity, and area under the receiver operator characteristic (AU-ROC) curve. Results Random Forest had the highest precision among the five techniques considered. The precision rates for the mentioned classifiers are as follows: Logistic Regression (93.67%), Naïve Bayes (94.87%), Decision Tree (96.1%), AdaBoost (94.94%), Random Forest (96.15%), and Bagging Tree (94.87%). The Random Forest classifier maintains the highest balance between correct and incorrect predictions. With 98.04% accuracy, the Random Forest classifier achieved the best precision (96.15%), robust recall (100%), and high F1 score (97.7%). In contrast, the Logistic Regression model achieved the lowest accuracy of 95.42%. Remarkably, the Random Forest classifier achieved the highest AUC value (0.989). Conclusion This research mainly focused on identifying factors that are critical in impacting patients with CVD and predicting CVD risk. It is strongly advised that the Random Forest technique be implemented in a system for predicting cardiac diseases. This research may change clinical practice by providing doctors with a new instrument to determine a patient’s CVD prognosis.https://doi.org/10.1186/s12872-024-03883-2Cardiovascular diseaseMachine learningRandom forestFeature selectionBangladesh
spellingShingle Sorif Hossain
Mohammad Kamrul Hasan
Mohammad Omar Faruk
Nelufa Aktar
Riyadh Hossain
Kabir Hossain
Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023
BMC Cardiovascular Disorders
Cardiovascular disease
Machine learning
Random forest
Feature selection
Bangladesh
title Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023
title_full Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023
title_fullStr Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023
title_full_unstemmed Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023
title_short Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023
title_sort machine learning approach for predicting cardiovascular disease in bangladesh evidence from a cross sectional study in 2023
topic Cardiovascular disease
Machine learning
Random forest
Feature selection
Bangladesh
url https://doi.org/10.1186/s12872-024-03883-2
work_keys_str_mv AT sorifhossain machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023
AT mohammadkamrulhasan machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023
AT mohammadomarfaruk machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023
AT nelufaaktar machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023
AT riyadhhossain machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023
AT kabirhossain machinelearningapproachforpredictingcardiovasculardiseaseinbangladeshevidencefromacrosssectionalstudyin2023