A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach
Stroke is the third leading cause of death in the world. It is a dangerous health disorder caused by the interruption of the blood flow to the brain, resulting in severe illness, disability, or death. An accurate prediction of stroke is necessary for the early stage of treatment and overcoming the m...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-11-01
|
Series: | Healthcare Analytics |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2772442522000569 |
_version_ | 1811178690618327040 |
---|---|
author | Nitish Biswas Khandaker Mohammad Mohi Uddin Sarreha Tasmin Rikta Samrat Kumar Dey |
author_facet | Nitish Biswas Khandaker Mohammad Mohi Uddin Sarreha Tasmin Rikta Samrat Kumar Dey |
author_sort | Nitish Biswas |
collection | DOAJ |
description | Stroke is the third leading cause of death in the world. It is a dangerous health disorder caused by the interruption of the blood flow to the brain, resulting in severe illness, disability, or death. An accurate prediction of stroke is necessary for the early stage of treatment and overcoming the mortality rate. This study proposes a machine learning approach to diagnose stroke with imbalanced data more accurately. Random Over Sampling (ROS) technique has been used in this work to balance the data. Eleven classifiers, including Support Vector Machine, Random Forest, K-nearest Neighbor, Decision Tree, Naïve Bayes, Voting Classifier, AdaBoost, Gradient Boosting, Multi-Layer Perception, and Nearest Centroid, are analyzed in this study. Ten classifiers show more than 90% accurate results before balancing the data and four classifiers display more than 96% accurate results after data-balancing using the oversampling method. The Hyperparameter tuning and cross-validation are performed in each model to enhance the results. Moreover, Accuracy, F1-Measure, Precision, and Recall are used to measure the performance of machine learning models. The results show the Support Vector Machine has the highest accuracy of 99.99%, with recall values of 99.99%, precision values of 99.99%, and F1-measure of 99.99%. Random Forest achieves the second-highest accuracy of 99.87%, with a 0.001% error. In addition, a user-friendly web app and a user-friendly mobile app are built based on the most accurate model. |
first_indexed | 2024-04-11T06:22:36Z |
format | Article |
id | doaj.art-16c4625bc39d4da09306fd46600d91f9 |
institution | Directory Open Access Journal |
issn | 2772-4425 |
language | English |
last_indexed | 2024-04-11T06:22:36Z |
publishDate | 2022-11-01 |
publisher | Elsevier |
record_format | Article |
series | Healthcare Analytics |
spelling | doaj.art-16c4625bc39d4da09306fd46600d91f92022-12-22T04:40:29ZengElsevierHealthcare Analytics2772-44252022-11-012100116A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approachNitish Biswas0Khandaker Mohammad Mohi Uddin1Sarreha Tasmin Rikta2Samrat Kumar Dey3Department of Computer Science and Engineering, Dhaka International University, Dhaka 1205, BangladeshDepartment of Computer Science and Engineering, Dhaka International University, Dhaka 1205, Bangladesh; Corresponding author.Department of Computer Science and Engineering, Dhaka International University, Dhaka 1205, BangladeshSchool of Science and Technology, Bangladesh Open University, Gazipur 1705, BangladeshStroke is the third leading cause of death in the world. It is a dangerous health disorder caused by the interruption of the blood flow to the brain, resulting in severe illness, disability, or death. An accurate prediction of stroke is necessary for the early stage of treatment and overcoming the mortality rate. This study proposes a machine learning approach to diagnose stroke with imbalanced data more accurately. Random Over Sampling (ROS) technique has been used in this work to balance the data. Eleven classifiers, including Support Vector Machine, Random Forest, K-nearest Neighbor, Decision Tree, Naïve Bayes, Voting Classifier, AdaBoost, Gradient Boosting, Multi-Layer Perception, and Nearest Centroid, are analyzed in this study. Ten classifiers show more than 90% accurate results before balancing the data and four classifiers display more than 96% accurate results after data-balancing using the oversampling method. The Hyperparameter tuning and cross-validation are performed in each model to enhance the results. Moreover, Accuracy, F1-Measure, Precision, and Recall are used to measure the performance of machine learning models. The results show the Support Vector Machine has the highest accuracy of 99.99%, with recall values of 99.99%, precision values of 99.99%, and F1-measure of 99.99%. Random Forest achieves the second-highest accuracy of 99.87%, with a 0.001% error. In addition, a user-friendly web app and a user-friendly mobile app are built based on the most accurate model.http://www.sciencedirect.com/science/article/pii/S2772442522000569Machine learningStrokeSupport vector machineRandom forestRandom over-samplingHyperparameter tuning |
spellingShingle | Nitish Biswas Khandaker Mohammad Mohi Uddin Sarreha Tasmin Rikta Samrat Kumar Dey A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach Healthcare Analytics Machine learning Stroke Support vector machine Random forest Random over-sampling Hyperparameter tuning |
title | A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach |
title_full | A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach |
title_fullStr | A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach |
title_full_unstemmed | A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach |
title_short | A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach |
title_sort | comparative analysis of machine learning classifiers for stroke prediction a predictive analytics approach |
topic | Machine learning Stroke Support vector machine Random forest Random over-sampling Hyperparameter tuning |
url | http://www.sciencedirect.com/science/article/pii/S2772442522000569 |
work_keys_str_mv | AT nitishbiswas acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT khandakermohammadmohiuddin acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT sarrehatasminrikta acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT samratkumardey acomparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT nitishbiswas comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT khandakermohammadmohiuddin comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT sarrehatasminrikta comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach AT samratkumardey comparativeanalysisofmachinelearningclassifiersforstrokepredictionapredictiveanalyticsapproach |