A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach

Stroke is the third leading cause of death in the world. It is a dangerous health disorder caused by the interruption of the blood flow to the brain, resulting in severe illness, disability, or death. An accurate prediction of stroke is necessary for the early stage of treatment and overcoming the m...

Full description

Bibliographic Details
Main Authors:	Nitish Biswas, Khandaker Mohammad Mohi Uddin, Sarreha Tasmin Rikta, Samrat Kumar Dey
Format:	Article
Language:	English
Published:	Elsevier 2022-11-01
Series:	Healthcare Analytics
Subjects:	Machine learning Stroke Support vector machine Random forest Random over-sampling Hyperparameter tuning
Online Access:	http://www.sciencedirect.com/science/article/pii/S2772442522000569

_version_	1828092075018551296
author	Nitish Biswas Khandaker Mohammad Mohi Uddin Sarreha Tasmin Rikta Samrat Kumar Dey
author_facet	Nitish Biswas Khandaker Mohammad Mohi Uddin Sarreha Tasmin Rikta Samrat Kumar Dey
author_sort	Nitish Biswas
collection	DOAJ
description	Stroke is the third leading cause of death in the world. It is a dangerous health disorder caused by the interruption of the blood flow to the brain, resulting in severe illness, disability, or death. An accurate prediction of stroke is necessary for the early stage of treatment and overcoming the mortality rate. This study proposes a machine learning approach to diagnose stroke with imbalanced data more accurately. Random Over Sampling (ROS) technique has been used in this work to balance the data. Eleven classifiers, including Support Vector Machine, Random Forest, K-nearest Neighbor, Decision Tree, Naïve Bayes, Voting Classifier, AdaBoost, Gradient Boosting, Multi-Layer Perception, and Nearest Centroid, are analyzed in this study. Ten classifiers show more than 90% accurate results before balancing the data and four classifiers display more than 96% accurate results after data-balancing using the oversampling method. The Hyperparameter tuning and cross-validation are performed in each model to enhance the results. Moreover, Accuracy, F1-Measure, Precision, and Recall are used to measure the performance of machine learning models. The results show the Support Vector Machine has the highest accuracy of 99.99%, with recall values of 99.99%, precision values of 99.99%, and F1-measure of 99.99%. Random Forest achieves the second-highest accuracy of 99.87%, with a 0.001% error. In addition, a user-friendly web app and a user-friendly mobile app are built based on the most accurate model.
first_indexed	2024-04-11T06:22:36Z
format	Article
id	doaj.art-16c4625bc39d4da09306fd46600d91f9
institution	Directory Open Access Journal
issn	2772-4425
language	English
last_indexed	2024-04-11T06:22:36Z
publishDate	2022-11-01
publisher	Elsevier
record_format	Article
series	Healthcare Analytics
spelling	doaj.art-16c4625bc39d4da09306fd46600d91f92022-12-22T04:40:29ZengElsevierHealthcare Analytics2772-44252022-11-012100116A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approachNitish Biswas0Khandaker Mohammad Mohi Uddin1Sarreha Tasmin Rikta2Samrat Kumar Dey3Department of Computer Science and Engineering, Dhaka International University, Dhaka 1205, BangladeshDepartment of Computer Science and Engineering, Dhaka International University, Dhaka 1205, Bangladesh; Corresponding author.Department of Computer Science and Engineering, Dhaka International University, Dhaka 1205, BangladeshSchool of Science and Technology, Bangladesh Open University, Gazipur 1705, BangladeshStroke is the third leading cause of death in the world. It is a dangerous health disorder caused by the interruption of the blood flow to the brain, resulting in severe illness, disability, or death. An accurate prediction of stroke is necessary for the early stage of treatment and overcoming the mortality rate. This study proposes a machine learning approach to diagnose stroke with imbalanced data more accurately. Random Over Sampling (ROS) technique has been used in this work to balance the data. Eleven classifiers, including Support Vector Machine, Random Forest, K-nearest Neighbor, Decision Tree, Naïve Bayes, Voting Classifier, AdaBoost, Gradient Boosting, Multi-Layer Perception, and Nearest Centroid, are analyzed in this study. Ten classifiers show more than 90% accurate results before balancing the data and four classifiers display more than 96% accurate results after data-balancing using the oversampling method. The Hyperparameter tuning and cross-validation are performed in each model to enhance the results. Moreover, Accuracy, F1-Measure, Precision, and Recall are used to measure the performance of machine learning models. The results show the Support Vector Machine has the highest accuracy of 99.99%, with recall values of 99.99%, precision values of 99.99%, and F1-measure of 99.99%. Random Forest achieves the second-highest accuracy of 99.87%, with a 0.001% error. In addition, a user-friendly web app and a user-friendly mobile app are built based on the most accurate model.http://www.sciencedirect.com/science/article/pii/S2772442522000569Machine learningStrokeSupport vector machineRandom forestRandom over-samplingHyperparameter tuning
spellingShingle	Nitish Biswas Khandaker Mohammad Mohi Uddin Sarreha Tasmin Rikta Samrat Kumar Dey A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach Healthcare Analytics Machine learning Stroke Support vector machine Random forest Random over-sampling Hyperparameter tuning
title	A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
title_full	A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
title_fullStr	A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
title_full_unstemmed	A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
title_short	A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
title_sort	comparative analysis of machine learning classifiers for stroke prediction a predictive analytics approach
topic	Machine learning Stroke Support vector machine Random forest Random over-sampling Hyperparameter tuning
url	http://www.sciencedirect.com/science/article/pii/S2772442522000569
work_keys_str_mv	AT nitishbiswas acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT khandakermohammadmohiuddin acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT sarrehatasminrikta acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT samratkumardey acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT nitishbiswas comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT khandakermohammadmohiuddin comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT sarrehatasminrikta comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT samratkumardey comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach

A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach

Similar Items