Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness

The early prediction of diabetes can facilitate interventions to prevent or delay it. This study proposes a diabetes prediction model based on machine learning (ML) to encourage individuals at risk of diabetes to employ healthy interventions. A total of 38,379 subjects were included. We trained the...

Full description

Bibliographic Details
Main Authors: Juyoung Shin, Joonyub Lee, Taehoon Ko, Kanghyuck Lee, Yera Choi, Hun-Sung Kim
Format: Article
Language:English
Published: MDPI AG 2022-11-01
Series:Journal of Personalized Medicine
Subjects:
Online Access:https://www.mdpi.com/2075-4426/12/11/1899
_version_ 1797464872605188096
author Juyoung Shin
Joonyub Lee
Taehoon Ko
Kanghyuck Lee
Yera Choi
Hun-Sung Kim
author_facet Juyoung Shin
Joonyub Lee
Taehoon Ko
Kanghyuck Lee
Yera Choi
Hun-Sung Kim
author_sort Juyoung Shin
collection DOAJ
description The early prediction of diabetes can facilitate interventions to prevent or delay it. This study proposes a diabetes prediction model based on machine learning (ML) to encourage individuals at risk of diabetes to employ healthy interventions. A total of 38,379 subjects were included. We trained the model on 80% of the subjects and verified its predictive performance on the remaining 20%. Furthermore, the performances of several algorithms were compared, including logistic regression, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), Cox regression, and XGBoost Survival Embedding (XGBSE). The area under the receiver operating characteristic curve (AUROC) of the XGBoost model was the largest, followed by those of the decision tree, logistic regression, and random forest models. For the survival analysis, XGBSE yielded an AUROC exceeding 0.9 for the 2- to 9-year predictions and a C-index of 0.934, while the Cox regression achieved a C-index of 0.921. After lowering the threshold from 0.5 to 0.25, the sensitivity increased from 0.011 to 0.236 for the 2-year prediction model and from 0.607 to 0.994 for the 9-year prediction model, while the specificity showed negligible changes. We developed a high-performance diabetes prediction model that applied the XGBSE algorithm with threshold adjustment. We plan to use this prediction model in real clinical practice for diabetes prevention after simplifying and validating it externally.
first_indexed 2024-03-09T18:13:22Z
format Article
id doaj.art-0052be4d10ee4fa7999d880cf640e05c
institution Directory Open Access Journal
issn 2075-4426
language English
last_indexed 2024-03-09T18:13:22Z
publishDate 2022-11-01
publisher MDPI AG
record_format Article
series Journal of Personalized Medicine
spelling doaj.art-0052be4d10ee4fa7999d880cf640e05c2023-11-24T08:54:08ZengMDPI AGJournal of Personalized Medicine2075-44262022-11-011211189910.3390/jpm12111899Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical EffectivenessJuyoung Shin0Joonyub Lee1Taehoon Ko2Kanghyuck Lee3Yera Choi4Hun-Sung Kim5Health Promotion Center, Seoul St. Mary’s Hospital, Seoul 06591, KoreaDivision of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, KoreaDepartment of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, KoreaDepartment of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, KoreaNAVER CLOVA AI Lab, Seongnam 13561, KoreaDivision of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, KoreaThe early prediction of diabetes can facilitate interventions to prevent or delay it. This study proposes a diabetes prediction model based on machine learning (ML) to encourage individuals at risk of diabetes to employ healthy interventions. A total of 38,379 subjects were included. We trained the model on 80% of the subjects and verified its predictive performance on the remaining 20%. Furthermore, the performances of several algorithms were compared, including logistic regression, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), Cox regression, and XGBoost Survival Embedding (XGBSE). The area under the receiver operating characteristic curve (AUROC) of the XGBoost model was the largest, followed by those of the decision tree, logistic regression, and random forest models. For the survival analysis, XGBSE yielded an AUROC exceeding 0.9 for the 2- to 9-year predictions and a C-index of 0.934, while the Cox regression achieved a C-index of 0.921. After lowering the threshold from 0.5 to 0.25, the sensitivity increased from 0.011 to 0.236 for the 2-year prediction model and from 0.607 to 0.994 for the 9-year prediction model, while the specificity showed negligible changes. We developed a high-performance diabetes prediction model that applied the XGBSE algorithm with threshold adjustment. We plan to use this prediction model in real clinical practice for diabetes prevention after simplifying and validating it externally.https://www.mdpi.com/2075-4426/12/11/1899diabetes prediction modeldiabetes preventiontype 2 diabetesXGBoost Survival Embedding
spellingShingle Juyoung Shin
Joonyub Lee
Taehoon Ko
Kanghyuck Lee
Yera Choi
Hun-Sung Kim
Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness
Journal of Personalized Medicine
diabetes prediction model
diabetes prevention
type 2 diabetes
XGBoost Survival Embedding
title Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness
title_full Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness
title_fullStr Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness
title_full_unstemmed Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness
title_short Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness
title_sort improving machine learning diabetes prediction models for the utmost clinical effectiveness
topic diabetes prediction model
diabetes prevention
type 2 diabetes
XGBoost Survival Embedding
url https://www.mdpi.com/2075-4426/12/11/1899
work_keys_str_mv AT juyoungshin improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness
AT joonyublee improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness
AT taehoonko improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness
AT kanghyucklee improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness
AT yerachoi improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness
AT hunsungkim improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness