A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach

Stroke is the third leading cause of death in the world. It is a dangerous health disorder caused by the interruption of the blood flow to the brain, resulting in severe illness, disability, or death. An accurate prediction of stroke is necessary for the early stage of treatment and overcoming the m...

Full description

Bibliographic Details
Main Authors: Nitish Biswas, Khandaker Mohammad Mohi Uddin, Sarreha Tasmin Rikta, Samrat Kumar Dey
Format: Article
Language:English
Published: Elsevier 2022-11-01
Series:Healthcare Analytics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772442522000569
_version_ 1811178690618327040
author Nitish Biswas
Khandaker Mohammad Mohi Uddin
Sarreha Tasmin Rikta
Samrat Kumar Dey
author_facet Nitish Biswas
Khandaker Mohammad Mohi Uddin
Sarreha Tasmin Rikta
Samrat Kumar Dey
author_sort Nitish Biswas
collection DOAJ
description Stroke is the third leading cause of death in the world. It is a dangerous health disorder caused by the interruption of the blood flow to the brain, resulting in severe illness, disability, or death. An accurate prediction of stroke is necessary for the early stage of treatment and overcoming the mortality rate. This study proposes a machine learning approach to diagnose stroke with imbalanced data more accurately. Random Over Sampling (ROS) technique has been used in this work to balance the data. Eleven classifiers, including Support Vector Machine, Random Forest, K-nearest Neighbor, Decision Tree, Naïve Bayes, Voting Classifier, AdaBoost, Gradient Boosting, Multi-Layer Perception, and Nearest Centroid, are analyzed in this study. Ten classifiers show more than 90% accurate results before balancing the data and four classifiers display more than 96% accurate results after data-balancing using the oversampling method. The Hyperparameter tuning and cross-validation are performed in each model to enhance the results. Moreover, Accuracy, F1-Measure, Precision, and Recall are used to measure the performance of machine learning models. The results show the Support Vector Machine has the highest accuracy of 99.99%, with recall values of 99.99%, precision values of 99.99%, and F1-measure of 99.99%. Random Forest achieves the second-highest accuracy of 99.87%, with a 0.001% error. In addition, a user-friendly web app and a user-friendly mobile app are built based on the most accurate model.
first_indexed 2024-04-11T06:22:36Z
format Article
id doaj.art-16c4625bc39d4da09306fd46600d91f9
institution Directory Open Access Journal
issn 2772-4425
language English
last_indexed 2024-04-11T06:22:36Z
publishDate 2022-11-01
publisher Elsevier
record_format Article
series Healthcare Analytics
spelling doaj.art-16c4625bc39d4da09306fd46600d91f92022-12-22T04:40:29ZengElsevierHealthcare Analytics2772-44252022-11-012100116A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approachNitish Biswas0Khandaker Mohammad Mohi Uddin1Sarreha Tasmin Rikta2Samrat Kumar Dey3Department of Computer Science and Engineering, Dhaka International University, Dhaka 1205, BangladeshDepartment of Computer Science and Engineering, Dhaka International University, Dhaka 1205, Bangladesh; Corresponding author.Department of Computer Science and Engineering, Dhaka International University, Dhaka 1205, BangladeshSchool of Science and Technology, Bangladesh Open University, Gazipur 1705, BangladeshStroke is the third leading cause of death in the world. It is a dangerous health disorder caused by the interruption of the blood flow to the brain, resulting in severe illness, disability, or death. An accurate prediction of stroke is necessary for the early stage of treatment and overcoming the mortality rate. This study proposes a machine learning approach to diagnose stroke with imbalanced data more accurately. Random Over Sampling (ROS) technique has been used in this work to balance the data. Eleven classifiers, including Support Vector Machine, Random Forest, K-nearest Neighbor, Decision Tree, Naïve Bayes, Voting Classifier, AdaBoost, Gradient Boosting, Multi-Layer Perception, and Nearest Centroid, are analyzed in this study. Ten classifiers show more than 90% accurate results before balancing the data and four classifiers display more than 96% accurate results after data-balancing using the oversampling method. The Hyperparameter tuning and cross-validation are performed in each model to enhance the results. Moreover, Accuracy, F1-Measure, Precision, and Recall are used to measure the performance of machine learning models. The results show the Support Vector Machine has the highest accuracy of 99.99%, with recall values of 99.99%, precision values of 99.99%, and F1-measure of 99.99%. Random Forest achieves the second-highest accuracy of 99.87%, with a 0.001% error. In addition, a user-friendly web app and a user-friendly mobile app are built based on the most accurate model.http://www.sciencedirect.com/science/article/pii/S2772442522000569Machine learningStrokeSupport vector machineRandom forestRandom over-samplingHyperparameter tuning
spellingShingle Nitish Biswas
Khandaker Mohammad Mohi Uddin
Sarreha Tasmin Rikta
Samrat Kumar Dey
A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
Healthcare Analytics
Machine learning
Stroke
Support vector machine
Random forest
Random over-sampling
Hyperparameter tuning
title A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
title_full A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
title_fullStr A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
title_full_unstemmed A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
title_short A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
title_sort comparative analysis of machine learning classifiers for stroke prediction a predictive analytics approach
topic Machine learning
Stroke
Support vector machine
Random forest
Random over-sampling
Hyperparameter tuning
url http://www.sciencedirect.com/science/article/pii/S2772442522000569
work_keys_str_mv AT nitishbiswas acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach
AT khandakermohammadmohiuddin acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach
AT sarrehatasminrikta acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach
AT samratkumardey acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach
AT nitishbiswas comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach
AT khandakermohammadmohiuddin comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach
AT sarrehatasminrikta comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach
AT samratkumardey comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach