Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models

Background:: In China, stroke has been the first leading cause of death in recent years. It is a major cause of long-term physical and cognitive impairment, which bring great pressure on the National Public Health System. On the other hand, China is a big country, evaluation of the risk of getting s...

Full description

Bibliographic Details
Main Authors: Junjie Liu, Yiyang Sun, Jing Ma, Jiachen Tu, Yuhui Deng, Ping He, Rongshan Li, Fengyun Hu, Huaxiong Huang, Xiaoshuang Zhou, Shixin Xu
Format: Article
Language:English
Published: Elsevier 2021-01-01
Series:Informatics in Medicine Unlocked
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352914821001933
_version_ 1818985253284347904
author Junjie Liu
Yiyang Sun
Jing Ma
Jiachen Tu
Yuhui Deng
Ping He
Rongshan Li
Fengyun Hu
Huaxiong Huang
Xiaoshuang Zhou
Shixin Xu
author_facet Junjie Liu
Yiyang Sun
Jing Ma
Jiachen Tu
Yuhui Deng
Ping He
Rongshan Li
Fengyun Hu
Huaxiong Huang
Xiaoshuang Zhou
Shixin Xu
author_sort Junjie Liu
collection DOAJ
description Background:: In China, stroke has been the first leading cause of death in recent years. It is a major cause of long-term physical and cognitive impairment, which bring great pressure on the National Public Health System. On the other hand, China is a big country, evaluation of the risk of getting stroke is important for the prevention and treatment of stroke in China. Methods:: A data set with 2000 hospitalized stroke patients in 2018 and 27583 residents during the year 2017 to 2020 is analyzed in this study. With the cleaned data, three models on stroke risk levels are built by using machine learning methods. The importance of “8+2” factors from China National Stroke Prevention Project (CSPP) is evaluated via decision tree and random forest models. The importance of more detailed features and their SHAP22 SHAP: SHapley Additive exPlanations. values are evaluated and ranked via random forest model. Furthermore, a logistic regression model is applied to evaluate the probability of getting stroke for different risk levels. Results:: Among all “8+2” risk factors of getting stroke, the decision tree model reveals that top three factors are Hypertension (0.4995),33 The value of importance. Physical Inactivity (0.08486) and Diabetes Mellitus (0.07889), and the random forest model shows that top three factors are Hypertension (0.3966), Hyperlipidemia (0.1229) and Physical Inactivity (0.1146). In addition to “8+2” factors the importance of features for lifestyle information, demographic information and medical measurement are evaluated via random forest model. It shows that top five features are Systolic Blood Pressure (SBP) (0.3670), Diastolic Blood Pressure (DBP) (0.1541), Physical Inactivity (0.0904), Body Mass Index (BMI) (0.0721) and Fasting Blood Glucose (FBG)(0.0531). SHAP values show that DBP, Physical Inactivity, SBP, BMI, Smoking, FBG, and Triglyceride(TG) are positively correlated to the risk of getting stroke. High-density Lipoprotein (HDL) is negatively correlated to the risk of getting stroke. Combining with the data of 2000 hospitalized stroke patients, the logistic regression model shows that the average probabilities of getting stroke are 7.20%±0.55%44 Confidence Interval with confidence level 95%. for the low-risk level patients, 19.02%±0.94% for the medium-risk level patients and 83.89%±0.97% for the high-risk level patients. Conclusion:: Based on the census data from Shanxi Province, we investigate stroke risk factors and their ranking. It shows that Hypertension, Physical Inactivity, and Overweight are ranked as the top three high stroke risk factors in Shanxi. The probability of getting a stroke is also estimated through our interpretable machine learning methods.
first_indexed 2024-12-20T18:31:57Z
format Article
id doaj.art-e4fd23dd8f6c44098677d668d1625609
institution Directory Open Access Journal
issn 2352-9148
language English
last_indexed 2024-12-20T18:31:57Z
publishDate 2021-01-01
publisher Elsevier
record_format Article
series Informatics in Medicine Unlocked
spelling doaj.art-e4fd23dd8f6c44098677d668d16256092022-12-21T19:30:01ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0126100712Analysis of main risk factors causing stroke in Shanxi Province based on machine learning modelsJunjie Liu0Yiyang Sun1Jing Ma2Jiachen Tu3Yuhui Deng4Ping He5Rongshan Li6Fengyun Hu7Huaxiong Huang8Xiaoshuang Zhou9Shixin Xu10BNU- HKBU United International College, Zhuhai, ChinaDuke Kunshan University, 8 Duke Ave, Kunshan, Jiangsu, ChinaLaboratory of Mathematics and Complex Systems (Ministry of Education), School of Mathematical Sciences, Beijing Normal University, Beijing 100875, China; Research Center for Mathematics, Beijing Normal University at Zhuhai, 519087, ChinaBNU- HKBU United International College, Zhuhai, ChinaBNU- HKBU United International College, Zhuhai, ChinaBNU- HKBU United International College, Zhuhai, ChinaDepartment of Nephrology, Shanxi Provincial People’s Hospital, Taiyuan, Shanxi, ChinaDepartment of Neurology, Shanxi Provincial People’s Hospital, Taiyuan, Shanxi, ChinaResearch Center for Mathematics, Beijing Normal University at Zhuhai, 519087, China; Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada; BNU- HKBU United International College, Zhuhai, China; Corresponding author at: Research Center for Mathematics, Beijing Normal University at Zhuhai, 519087, ChinaDepartment of Nephrology, Shanxi Provincial People’s Hospital, Taiyuan, Shanxi, China; Corresponding authors.Duke Kunshan University, 8 Duke Ave, Kunshan, Jiangsu, China; Corresponding authors.Background:: In China, stroke has been the first leading cause of death in recent years. It is a major cause of long-term physical and cognitive impairment, which bring great pressure on the National Public Health System. On the other hand, China is a big country, evaluation of the risk of getting stroke is important for the prevention and treatment of stroke in China. Methods:: A data set with 2000 hospitalized stroke patients in 2018 and 27583 residents during the year 2017 to 2020 is analyzed in this study. With the cleaned data, three models on stroke risk levels are built by using machine learning methods. The importance of “8+2” factors from China National Stroke Prevention Project (CSPP) is evaluated via decision tree and random forest models. The importance of more detailed features and their SHAP22 SHAP: SHapley Additive exPlanations. values are evaluated and ranked via random forest model. Furthermore, a logistic regression model is applied to evaluate the probability of getting stroke for different risk levels. Results:: Among all “8+2” risk factors of getting stroke, the decision tree model reveals that top three factors are Hypertension (0.4995),33 The value of importance. Physical Inactivity (0.08486) and Diabetes Mellitus (0.07889), and the random forest model shows that top three factors are Hypertension (0.3966), Hyperlipidemia (0.1229) and Physical Inactivity (0.1146). In addition to “8+2” factors the importance of features for lifestyle information, demographic information and medical measurement are evaluated via random forest model. It shows that top five features are Systolic Blood Pressure (SBP) (0.3670), Diastolic Blood Pressure (DBP) (0.1541), Physical Inactivity (0.0904), Body Mass Index (BMI) (0.0721) and Fasting Blood Glucose (FBG)(0.0531). SHAP values show that DBP, Physical Inactivity, SBP, BMI, Smoking, FBG, and Triglyceride(TG) are positively correlated to the risk of getting stroke. High-density Lipoprotein (HDL) is negatively correlated to the risk of getting stroke. Combining with the data of 2000 hospitalized stroke patients, the logistic regression model shows that the average probabilities of getting stroke are 7.20%±0.55%44 Confidence Interval with confidence level 95%. for the low-risk level patients, 19.02%±0.94% for the medium-risk level patients and 83.89%±0.97% for the high-risk level patients. Conclusion:: Based on the census data from Shanxi Province, we investigate stroke risk factors and their ranking. It shows that Hypertension, Physical Inactivity, and Overweight are ranked as the top three high stroke risk factors in Shanxi. The probability of getting a stroke is also estimated through our interpretable machine learning methods.http://www.sciencedirect.com/science/article/pii/S2352914821001933StrokeMachine learningRisk factor rankingSHAP value
spellingShingle Junjie Liu
Yiyang Sun
Jing Ma
Jiachen Tu
Yuhui Deng
Ping He
Rongshan Li
Fengyun Hu
Huaxiong Huang
Xiaoshuang Zhou
Shixin Xu
Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models
Informatics in Medicine Unlocked
Stroke
Machine learning
Risk factor ranking
SHAP value
title Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models
title_full Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models
title_fullStr Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models
title_full_unstemmed Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models
title_short Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models
title_sort analysis of main risk factors causing stroke in shanxi province based on machine learning models
topic Stroke
Machine learning
Risk factor ranking
SHAP value
url http://www.sciencedirect.com/science/article/pii/S2352914821001933
work_keys_str_mv AT junjieliu analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels
AT yiyangsun analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels
AT jingma analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels
AT jiachentu analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels
AT yuhuideng analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels
AT pinghe analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels
AT rongshanli analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels
AT fengyunhu analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels
AT huaxionghuang analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels
AT xiaoshuangzhou analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels
AT shixinxu analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels