Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques
Early identification of individuals at high risk of diabetes is crucial for implementing early intervention strategies. However, algorithms specific to elderly Chinese adults are lacking. The aim of this study is to build effective prediction models based on machine learning (ML) for the risk of typ...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-05-01
|
Series: | Journal of Personalized Medicine |
Subjects: | |
Online Access: | https://www.mdpi.com/2075-4426/12/6/905 |
_version_ | 1797485536661733376 |
---|---|
author | Qing Liu Miao Zhang Yifeng He Lei Zhang Jingui Zou Yaqiong Yan Yan Guo |
author_facet | Qing Liu Miao Zhang Yifeng He Lei Zhang Jingui Zou Yaqiong Yan Yan Guo |
author_sort | Qing Liu |
collection | DOAJ |
description | Early identification of individuals at high risk of diabetes is crucial for implementing early intervention strategies. However, algorithms specific to elderly Chinese adults are lacking. The aim of this study is to build effective prediction models based on machine learning (ML) for the risk of type 2 diabetes mellitus (T2DM) in Chinese elderly. A retrospective cohort study was conducted using the health screening data of adults older than 65 years in Wuhan, China from 2018 to 2020. With a strict data filtration, 127,031 records from the eligible participants were utilized. Overall, 8298 participants were diagnosed with incident T2DM during the 2-year follow-up (2019–2020). The dataset was randomly split into training set (<i>n</i> = 101,625) and test set (<i>n</i> = 25,406). We developed prediction models based on four ML algorithms: logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost). Using LASSO regression, 21 prediction features were selected. The Random under-sampling (RUS) was applied to address the class imbalance, and the Shapley Additive Explanations (SHAP) was used to calculate and visualize feature importance. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy. The XGBoost model achieved the best performance (AUC = 0.7805, sensitivity = 0.6452, specificity = 0.7577, accuracy = 0.7503). Fasting plasma glucose (FPG), education, exercise, gender, and waist circumference (WC) were the top five important predictors. This study showed that XGBoost model can be applied to screen individuals at high risk of T2DM in the early phrase, which has the strong potential for intelligent prevention and control of diabetes. The key features could also be useful for developing targeted diabetes prevention interventions. |
first_indexed | 2024-03-09T23:20:13Z |
format | Article |
id | doaj.art-20e1ffa500c744e6ab6858db3cd7bf92 |
institution | Directory Open Access Journal |
issn | 2075-4426 |
language | English |
last_indexed | 2024-03-09T23:20:13Z |
publishDate | 2022-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Journal of Personalized Medicine |
spelling | doaj.art-20e1ffa500c744e6ab6858db3cd7bf922023-11-23T17:27:24ZengMDPI AGJournal of Personalized Medicine2075-44262022-05-0112690510.3390/jpm12060905Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning TechniquesQing Liu0Miao Zhang1Yifeng He2Lei Zhang3Jingui Zou4Yaqiong Yan5Yan Guo6Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, ChinaDepartment of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, ChinaSchool of Geodesy and Geomatics, Wuhan University, Wuhan 430079, ChinaSchool of Mathematics and Statistics, Wuhan University, Wuhan 430070, ChinaSchool of Geodesy and Geomatics, Wuhan University, Wuhan 430079, ChinaWuhan Center for Disease Control and Prevention, Wuhan 430015, ChinaWuhan Center for Disease Control and Prevention, Wuhan 430015, ChinaEarly identification of individuals at high risk of diabetes is crucial for implementing early intervention strategies. However, algorithms specific to elderly Chinese adults are lacking. The aim of this study is to build effective prediction models based on machine learning (ML) for the risk of type 2 diabetes mellitus (T2DM) in Chinese elderly. A retrospective cohort study was conducted using the health screening data of adults older than 65 years in Wuhan, China from 2018 to 2020. With a strict data filtration, 127,031 records from the eligible participants were utilized. Overall, 8298 participants were diagnosed with incident T2DM during the 2-year follow-up (2019–2020). The dataset was randomly split into training set (<i>n</i> = 101,625) and test set (<i>n</i> = 25,406). We developed prediction models based on four ML algorithms: logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost). Using LASSO regression, 21 prediction features were selected. The Random under-sampling (RUS) was applied to address the class imbalance, and the Shapley Additive Explanations (SHAP) was used to calculate and visualize feature importance. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy. The XGBoost model achieved the best performance (AUC = 0.7805, sensitivity = 0.6452, specificity = 0.7577, accuracy = 0.7503). Fasting plasma glucose (FPG), education, exercise, gender, and waist circumference (WC) were the top five important predictors. This study showed that XGBoost model can be applied to screen individuals at high risk of T2DM in the early phrase, which has the strong potential for intelligent prevention and control of diabetes. The key features could also be useful for developing targeted diabetes prevention interventions.https://www.mdpi.com/2075-4426/12/6/905type 2 diabetes mellitus (T2DM)machine learningprediction modelChinese elderly |
spellingShingle | Qing Liu Miao Zhang Yifeng He Lei Zhang Jingui Zou Yaqiong Yan Yan Guo Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques Journal of Personalized Medicine type 2 diabetes mellitus (T2DM) machine learning prediction model Chinese elderly |
title | Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques |
title_full | Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques |
title_fullStr | Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques |
title_full_unstemmed | Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques |
title_short | Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques |
title_sort | predicting the risk of incident type 2 diabetes mellitus in chinese elderly using machine learning techniques |
topic | type 2 diabetes mellitus (T2DM) machine learning prediction model Chinese elderly |
url | https://www.mdpi.com/2075-4426/12/6/905 |
work_keys_str_mv | AT qingliu predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques AT miaozhang predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques AT yifenghe predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques AT leizhang predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques AT jinguizou predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques AT yaqiongyan predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques AT yanguo predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques |