Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models
Background:: In China, stroke has been the first leading cause of death in recent years. It is a major cause of long-term physical and cognitive impairment, which bring great pressure on the National Public Health System. On the other hand, China is a big country, evaluation of the risk of getting s...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-01-01
|
Series: | Informatics in Medicine Unlocked |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352914821001933 |
_version_ | 1818985253284347904 |
---|---|
author | Junjie Liu Yiyang Sun Jing Ma Jiachen Tu Yuhui Deng Ping He Rongshan Li Fengyun Hu Huaxiong Huang Xiaoshuang Zhou Shixin Xu |
author_facet | Junjie Liu Yiyang Sun Jing Ma Jiachen Tu Yuhui Deng Ping He Rongshan Li Fengyun Hu Huaxiong Huang Xiaoshuang Zhou Shixin Xu |
author_sort | Junjie Liu |
collection | DOAJ |
description | Background:: In China, stroke has been the first leading cause of death in recent years. It is a major cause of long-term physical and cognitive impairment, which bring great pressure on the National Public Health System. On the other hand, China is a big country, evaluation of the risk of getting stroke is important for the prevention and treatment of stroke in China. Methods:: A data set with 2000 hospitalized stroke patients in 2018 and 27583 residents during the year 2017 to 2020 is analyzed in this study. With the cleaned data, three models on stroke risk levels are built by using machine learning methods. The importance of “8+2” factors from China National Stroke Prevention Project (CSPP) is evaluated via decision tree and random forest models. The importance of more detailed features and their SHAP22 SHAP: SHapley Additive exPlanations. values are evaluated and ranked via random forest model. Furthermore, a logistic regression model is applied to evaluate the probability of getting stroke for different risk levels. Results:: Among all “8+2” risk factors of getting stroke, the decision tree model reveals that top three factors are Hypertension (0.4995),33 The value of importance. Physical Inactivity (0.08486) and Diabetes Mellitus (0.07889), and the random forest model shows that top three factors are Hypertension (0.3966), Hyperlipidemia (0.1229) and Physical Inactivity (0.1146). In addition to “8+2” factors the importance of features for lifestyle information, demographic information and medical measurement are evaluated via random forest model. It shows that top five features are Systolic Blood Pressure (SBP) (0.3670), Diastolic Blood Pressure (DBP) (0.1541), Physical Inactivity (0.0904), Body Mass Index (BMI) (0.0721) and Fasting Blood Glucose (FBG)(0.0531). SHAP values show that DBP, Physical Inactivity, SBP, BMI, Smoking, FBG, and Triglyceride(TG) are positively correlated to the risk of getting stroke. High-density Lipoprotein (HDL) is negatively correlated to the risk of getting stroke. Combining with the data of 2000 hospitalized stroke patients, the logistic regression model shows that the average probabilities of getting stroke are 7.20%±0.55%44 Confidence Interval with confidence level 95%. for the low-risk level patients, 19.02%±0.94% for the medium-risk level patients and 83.89%±0.97% for the high-risk level patients. Conclusion:: Based on the census data from Shanxi Province, we investigate stroke risk factors and their ranking. It shows that Hypertension, Physical Inactivity, and Overweight are ranked as the top three high stroke risk factors in Shanxi. The probability of getting a stroke is also estimated through our interpretable machine learning methods. |
first_indexed | 2024-12-20T18:31:57Z |
format | Article |
id | doaj.art-e4fd23dd8f6c44098677d668d1625609 |
institution | Directory Open Access Journal |
issn | 2352-9148 |
language | English |
last_indexed | 2024-12-20T18:31:57Z |
publishDate | 2021-01-01 |
publisher | Elsevier |
record_format | Article |
series | Informatics in Medicine Unlocked |
spelling | doaj.art-e4fd23dd8f6c44098677d668d16256092022-12-21T19:30:01ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0126100712Analysis of main risk factors causing stroke in Shanxi Province based on machine learning modelsJunjie Liu0Yiyang Sun1Jing Ma2Jiachen Tu3Yuhui Deng4Ping He5Rongshan Li6Fengyun Hu7Huaxiong Huang8Xiaoshuang Zhou9Shixin Xu10BNU- HKBU United International College, Zhuhai, ChinaDuke Kunshan University, 8 Duke Ave, Kunshan, Jiangsu, ChinaLaboratory of Mathematics and Complex Systems (Ministry of Education), School of Mathematical Sciences, Beijing Normal University, Beijing 100875, China; Research Center for Mathematics, Beijing Normal University at Zhuhai, 519087, ChinaBNU- HKBU United International College, Zhuhai, ChinaBNU- HKBU United International College, Zhuhai, ChinaBNU- HKBU United International College, Zhuhai, ChinaDepartment of Nephrology, Shanxi Provincial People’s Hospital, Taiyuan, Shanxi, ChinaDepartment of Neurology, Shanxi Provincial People’s Hospital, Taiyuan, Shanxi, ChinaResearch Center for Mathematics, Beijing Normal University at Zhuhai, 519087, China; Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada; BNU- HKBU United International College, Zhuhai, China; Corresponding author at: Research Center for Mathematics, Beijing Normal University at Zhuhai, 519087, ChinaDepartment of Nephrology, Shanxi Provincial People’s Hospital, Taiyuan, Shanxi, China; Corresponding authors.Duke Kunshan University, 8 Duke Ave, Kunshan, Jiangsu, China; Corresponding authors.Background:: In China, stroke has been the first leading cause of death in recent years. It is a major cause of long-term physical and cognitive impairment, which bring great pressure on the National Public Health System. On the other hand, China is a big country, evaluation of the risk of getting stroke is important for the prevention and treatment of stroke in China. Methods:: A data set with 2000 hospitalized stroke patients in 2018 and 27583 residents during the year 2017 to 2020 is analyzed in this study. With the cleaned data, three models on stroke risk levels are built by using machine learning methods. The importance of “8+2” factors from China National Stroke Prevention Project (CSPP) is evaluated via decision tree and random forest models. The importance of more detailed features and their SHAP22 SHAP: SHapley Additive exPlanations. values are evaluated and ranked via random forest model. Furthermore, a logistic regression model is applied to evaluate the probability of getting stroke for different risk levels. Results:: Among all “8+2” risk factors of getting stroke, the decision tree model reveals that top three factors are Hypertension (0.4995),33 The value of importance. Physical Inactivity (0.08486) and Diabetes Mellitus (0.07889), and the random forest model shows that top three factors are Hypertension (0.3966), Hyperlipidemia (0.1229) and Physical Inactivity (0.1146). In addition to “8+2” factors the importance of features for lifestyle information, demographic information and medical measurement are evaluated via random forest model. It shows that top five features are Systolic Blood Pressure (SBP) (0.3670), Diastolic Blood Pressure (DBP) (0.1541), Physical Inactivity (0.0904), Body Mass Index (BMI) (0.0721) and Fasting Blood Glucose (FBG)(0.0531). SHAP values show that DBP, Physical Inactivity, SBP, BMI, Smoking, FBG, and Triglyceride(TG) are positively correlated to the risk of getting stroke. High-density Lipoprotein (HDL) is negatively correlated to the risk of getting stroke. Combining with the data of 2000 hospitalized stroke patients, the logistic regression model shows that the average probabilities of getting stroke are 7.20%±0.55%44 Confidence Interval with confidence level 95%. for the low-risk level patients, 19.02%±0.94% for the medium-risk level patients and 83.89%±0.97% for the high-risk level patients. Conclusion:: Based on the census data from Shanxi Province, we investigate stroke risk factors and their ranking. It shows that Hypertension, Physical Inactivity, and Overweight are ranked as the top three high stroke risk factors in Shanxi. The probability of getting a stroke is also estimated through our interpretable machine learning methods.http://www.sciencedirect.com/science/article/pii/S2352914821001933StrokeMachine learningRisk factor rankingSHAP value |
spellingShingle | Junjie Liu Yiyang Sun Jing Ma Jiachen Tu Yuhui Deng Ping He Rongshan Li Fengyun Hu Huaxiong Huang Xiaoshuang Zhou Shixin Xu Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models Informatics in Medicine Unlocked Stroke Machine learning Risk factor ranking SHAP value |
title | Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models |
title_full | Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models |
title_fullStr | Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models |
title_full_unstemmed | Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models |
title_short | Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models |
title_sort | analysis of main risk factors causing stroke in shanxi province based on machine learning models |
topic | Stroke Machine learning Risk factor ranking SHAP value |
url | http://www.sciencedirect.com/science/article/pii/S2352914821001933 |
work_keys_str_mv | AT junjieliu analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels AT yiyangsun analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels AT jingma analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels AT jiachentu analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels AT yuhuideng analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels AT pinghe analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels AT rongshanli analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels AT fengyunhu analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels AT huaxionghuang analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels AT xiaoshuangzhou analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels AT shixinxu analysisofmainriskfactorscausingstrokeinshanxiprovincebasedonmachinelearningmodels |