Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques

Early identification of individuals at high risk of diabetes is crucial for implementing early intervention strategies. However, algorithms specific to elderly Chinese adults are lacking. The aim of this study is to build effective prediction models based on machine learning (ML) for the risk of typ...

Full description

Bibliographic Details
Main Authors: Qing Liu, Miao Zhang, Yifeng He, Lei Zhang, Jingui Zou, Yaqiong Yan, Yan Guo
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Journal of Personalized Medicine
Subjects:
Online Access:https://www.mdpi.com/2075-4426/12/6/905
_version_ 1797485536661733376
author Qing Liu
Miao Zhang
Yifeng He
Lei Zhang
Jingui Zou
Yaqiong Yan
Yan Guo
author_facet Qing Liu
Miao Zhang
Yifeng He
Lei Zhang
Jingui Zou
Yaqiong Yan
Yan Guo
author_sort Qing Liu
collection DOAJ
description Early identification of individuals at high risk of diabetes is crucial for implementing early intervention strategies. However, algorithms specific to elderly Chinese adults are lacking. The aim of this study is to build effective prediction models based on machine learning (ML) for the risk of type 2 diabetes mellitus (T2DM) in Chinese elderly. A retrospective cohort study was conducted using the health screening data of adults older than 65 years in Wuhan, China from 2018 to 2020. With a strict data filtration, 127,031 records from the eligible participants were utilized. Overall, 8298 participants were diagnosed with incident T2DM during the 2-year follow-up (2019–2020). The dataset was randomly split into training set (<i>n</i> = 101,625) and test set (<i>n</i> = 25,406). We developed prediction models based on four ML algorithms: logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost). Using LASSO regression, 21 prediction features were selected. The Random under-sampling (RUS) was applied to address the class imbalance, and the Shapley Additive Explanations (SHAP) was used to calculate and visualize feature importance. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy. The XGBoost model achieved the best performance (AUC = 0.7805, sensitivity = 0.6452, specificity = 0.7577, accuracy = 0.7503). Fasting plasma glucose (FPG), education, exercise, gender, and waist circumference (WC) were the top five important predictors. This study showed that XGBoost model can be applied to screen individuals at high risk of T2DM in the early phrase, which has the strong potential for intelligent prevention and control of diabetes. The key features could also be useful for developing targeted diabetes prevention interventions.
first_indexed 2024-03-09T23:20:13Z
format Article
id doaj.art-20e1ffa500c744e6ab6858db3cd7bf92
institution Directory Open Access Journal
issn 2075-4426
language English
last_indexed 2024-03-09T23:20:13Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Journal of Personalized Medicine
spelling doaj.art-20e1ffa500c744e6ab6858db3cd7bf922023-11-23T17:27:24ZengMDPI AGJournal of Personalized Medicine2075-44262022-05-0112690510.3390/jpm12060905Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning TechniquesQing Liu0Miao Zhang1Yifeng He2Lei Zhang3Jingui Zou4Yaqiong Yan5Yan Guo6Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, ChinaDepartment of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, ChinaSchool of Geodesy and Geomatics, Wuhan University, Wuhan 430079, ChinaSchool of Mathematics and Statistics, Wuhan University, Wuhan 430070, ChinaSchool of Geodesy and Geomatics, Wuhan University, Wuhan 430079, ChinaWuhan Center for Disease Control and Prevention, Wuhan 430015, ChinaWuhan Center for Disease Control and Prevention, Wuhan 430015, ChinaEarly identification of individuals at high risk of diabetes is crucial for implementing early intervention strategies. However, algorithms specific to elderly Chinese adults are lacking. The aim of this study is to build effective prediction models based on machine learning (ML) for the risk of type 2 diabetes mellitus (T2DM) in Chinese elderly. A retrospective cohort study was conducted using the health screening data of adults older than 65 years in Wuhan, China from 2018 to 2020. With a strict data filtration, 127,031 records from the eligible participants were utilized. Overall, 8298 participants were diagnosed with incident T2DM during the 2-year follow-up (2019–2020). The dataset was randomly split into training set (<i>n</i> = 101,625) and test set (<i>n</i> = 25,406). We developed prediction models based on four ML algorithms: logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost). Using LASSO regression, 21 prediction features were selected. The Random under-sampling (RUS) was applied to address the class imbalance, and the Shapley Additive Explanations (SHAP) was used to calculate and visualize feature importance. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy. The XGBoost model achieved the best performance (AUC = 0.7805, sensitivity = 0.6452, specificity = 0.7577, accuracy = 0.7503). Fasting plasma glucose (FPG), education, exercise, gender, and waist circumference (WC) were the top five important predictors. This study showed that XGBoost model can be applied to screen individuals at high risk of T2DM in the early phrase, which has the strong potential for intelligent prevention and control of diabetes. The key features could also be useful for developing targeted diabetes prevention interventions.https://www.mdpi.com/2075-4426/12/6/905type 2 diabetes mellitus (T2DM)machine learningprediction modelChinese elderly
spellingShingle Qing Liu
Miao Zhang
Yifeng He
Lei Zhang
Jingui Zou
Yaqiong Yan
Yan Guo
Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques
Journal of Personalized Medicine
type 2 diabetes mellitus (T2DM)
machine learning
prediction model
Chinese elderly
title Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques
title_full Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques
title_fullStr Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques
title_full_unstemmed Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques
title_short Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques
title_sort predicting the risk of incident type 2 diabetes mellitus in chinese elderly using machine learning techniques
topic type 2 diabetes mellitus (T2DM)
machine learning
prediction model
Chinese elderly
url https://www.mdpi.com/2075-4426/12/6/905
work_keys_str_mv AT qingliu predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques
AT miaozhang predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques
AT yifenghe predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques
AT leizhang predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques
AT jinguizou predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques
AT yaqiongyan predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques
AT yanguo predictingtheriskofincidenttype2diabetesmellitusinchineseelderlyusingmachinelearningtechniques