Unlocking stroke prediction: Harnessing projection-based statistical feature extraction with ML algorithms

Non-communicable diseases, such as cardiovascular disease, cancer, chronic respiratory diseases, and diabetes, are responsible for approximately 71% of all deaths worldwide. Stroke, a cerebrovascular disorder, is one of the leading contributors to this burden among the top three causes of death. Ear...

Full description

Bibliographic Details
Main Authors: Saad Sahriar, Sanjida Akther, Jannatul Mauya, Ruhul Amin, Md Shahajada Mia, Sabba Ruhi, Md Shamim Reza
Format: Article
Language:English
Published: Elsevier 2024-03-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S240584402403442X
_version_ 1797259735986077696
author Saad Sahriar
Sanjida Akther
Jannatul Mauya
Ruhul Amin
Md Shahajada Mia
Sabba Ruhi
Md Shamim Reza
author_facet Saad Sahriar
Sanjida Akther
Jannatul Mauya
Ruhul Amin
Md Shahajada Mia
Sabba Ruhi
Md Shamim Reza
author_sort Saad Sahriar
collection DOAJ
description Non-communicable diseases, such as cardiovascular disease, cancer, chronic respiratory diseases, and diabetes, are responsible for approximately 71% of all deaths worldwide. Stroke, a cerebrovascular disorder, is one of the leading contributors to this burden among the top three causes of death. Early recognition of symptoms can encourage a balanced lifestyle and provide essential information for stroke prediction. To identify a stroke patient and risk factors, machine learning (ML) is a key tool for physicians. Due to different data measurement scales and their probability distributional assumptions, ML-based algorithms struggle to detect risk factors. Furthermore, when dealing with risk factors with high-dimensional features, learning algorithms struggle with complexity. In this study, rigorous statistical tests are used to identify risk factors, and PCA-FA (Integration of Principal Components and Factors) and FPCA (Factor Based PCA) approaches are proposed for projecting suitable feature representations for improving learning algorithm performances. The study dataset consists of different clinical, lifestyle, and genetic attributes, allowing for a comprehensive analysis of potential risk factors associated with stroke, which contains 5110 patient records. Using significant test (P-value <0.05), chi-square and independent sample t-test identified age, heart_disease, hypertension, work_type, ever_married, bmi, and smoking_status as risk factors for stroke. To develop the predicting model with proposed feature extraction techniques, random forests approach provides the best results when utilizing the PCA-FA method. The best accuracy rate for this approach is 92.55%, while the AUC score is 98.15%. The prediction accuracy has increased from 2.19% to 19.03% compared to the existing work. Additionally, the prediction results is robustified and reproducible with a stacking ensemble-based classification algorithm. We also developed a web-based application to help doctors diagnose stroke risk based on the findings of this study, which could be used as an additional tool to help doctors diagnose.
first_indexed 2024-04-24T23:14:10Z
format Article
id doaj.art-b86ff4e50bad4ebf9de626487b889f11
institution Directory Open Access Journal
issn 2405-8440
language English
last_indexed 2024-04-24T23:14:10Z
publishDate 2024-03-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj.art-b86ff4e50bad4ebf9de626487b889f112024-03-17T07:58:05ZengElsevierHeliyon2405-84402024-03-01105e27411Unlocking stroke prediction: Harnessing projection-based statistical feature extraction with ML algorithmsSaad Sahriar0Sanjida Akther1Jannatul Mauya2Ruhul Amin3Md Shahajada Mia4Sabba Ruhi5Md Shamim Reza6Deep Statistical Learning and Research Lab, Department of Statistics, Pabna University of Science &amp; Technology, Pabna, 6600, BangladeshDeep Statistical Learning and Research Lab, Department of Statistics, Pabna University of Science &amp; Technology, Pabna, 6600, BangladeshDeep Statistical Learning and Research Lab, Department of Statistics, Pabna University of Science &amp; Technology, Pabna, 6600, BangladeshDeep Statistical Learning and Research Lab, Department of Statistics, Pabna University of Science &amp; Technology, Pabna, 6600, BangladeshDepartment of Statistics, Pabna University of Science &amp; Technology, Pabna, 6600, BangladeshDepartment of Statistics, Pabna University of Science &amp; Technology, Pabna, 6600, BangladeshDeep Statistical Learning and Research Lab, Department of Statistics, Pabna University of Science &amp; Technology, Pabna, 6600, Bangladesh; Department of Statistics, Pabna University of Science &amp; Technology, Pabna, 6600, Bangladesh; Corresponding author.Non-communicable diseases, such as cardiovascular disease, cancer, chronic respiratory diseases, and diabetes, are responsible for approximately 71% of all deaths worldwide. Stroke, a cerebrovascular disorder, is one of the leading contributors to this burden among the top three causes of death. Early recognition of symptoms can encourage a balanced lifestyle and provide essential information for stroke prediction. To identify a stroke patient and risk factors, machine learning (ML) is a key tool for physicians. Due to different data measurement scales and their probability distributional assumptions, ML-based algorithms struggle to detect risk factors. Furthermore, when dealing with risk factors with high-dimensional features, learning algorithms struggle with complexity. In this study, rigorous statistical tests are used to identify risk factors, and PCA-FA (Integration of Principal Components and Factors) and FPCA (Factor Based PCA) approaches are proposed for projecting suitable feature representations for improving learning algorithm performances. The study dataset consists of different clinical, lifestyle, and genetic attributes, allowing for a comprehensive analysis of potential risk factors associated with stroke, which contains 5110 patient records. Using significant test (P-value <0.05), chi-square and independent sample t-test identified age, heart_disease, hypertension, work_type, ever_married, bmi, and smoking_status as risk factors for stroke. To develop the predicting model with proposed feature extraction techniques, random forests approach provides the best results when utilizing the PCA-FA method. The best accuracy rate for this approach is 92.55%, while the AUC score is 98.15%. The prediction accuracy has increased from 2.19% to 19.03% compared to the existing work. Additionally, the prediction results is robustified and reproducible with a stacking ensemble-based classification algorithm. We also developed a web-based application to help doctors diagnose stroke risk based on the findings of this study, which could be used as an additional tool to help doctors diagnose.http://www.sciencedirect.com/science/article/pii/S240584402403442XStrokeRisk predictionMachine learningPCAFAMedical diagnosis
spellingShingle Saad Sahriar
Sanjida Akther
Jannatul Mauya
Ruhul Amin
Md Shahajada Mia
Sabba Ruhi
Md Shamim Reza
Unlocking stroke prediction: Harnessing projection-based statistical feature extraction with ML algorithms
Heliyon
Stroke
Risk prediction
Machine learning
PCA
FA
Medical diagnosis
title Unlocking stroke prediction: Harnessing projection-based statistical feature extraction with ML algorithms
title_full Unlocking stroke prediction: Harnessing projection-based statistical feature extraction with ML algorithms
title_fullStr Unlocking stroke prediction: Harnessing projection-based statistical feature extraction with ML algorithms
title_full_unstemmed Unlocking stroke prediction: Harnessing projection-based statistical feature extraction with ML algorithms
title_short Unlocking stroke prediction: Harnessing projection-based statistical feature extraction with ML algorithms
title_sort unlocking stroke prediction harnessing projection based statistical feature extraction with ml algorithms
topic Stroke
Risk prediction
Machine learning
PCA
FA
Medical diagnosis
url http://www.sciencedirect.com/science/article/pii/S240584402403442X
work_keys_str_mv AT saadsahriar unlockingstrokepredictionharnessingprojectionbasedstatisticalfeatureextractionwithmlalgorithms
AT sanjidaakther unlockingstrokepredictionharnessingprojectionbasedstatisticalfeatureextractionwithmlalgorithms
AT jannatulmauya unlockingstrokepredictionharnessingprojectionbasedstatisticalfeatureextractionwithmlalgorithms
AT ruhulamin unlockingstrokepredictionharnessingprojectionbasedstatisticalfeatureextractionwithmlalgorithms
AT mdshahajadamia unlockingstrokepredictionharnessingprojectionbasedstatisticalfeatureextractionwithmlalgorithms
AT sabbaruhi unlockingstrokepredictionharnessingprojectionbasedstatisticalfeatureextractionwithmlalgorithms
AT mdshamimreza unlockingstrokepredictionharnessingprojectionbasedstatisticalfeatureextractionwithmlalgorithms