Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness
The early prediction of diabetes can facilitate interventions to prevent or delay it. This study proposes a diabetes prediction model based on machine learning (ML) to encourage individuals at risk of diabetes to employ healthy interventions. A total of 38,379 subjects were included. We trained the...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-11-01
|
Series: | Journal of Personalized Medicine |
Subjects: | |
Online Access: | https://www.mdpi.com/2075-4426/12/11/1899 |
_version_ | 1797464872605188096 |
---|---|
author | Juyoung Shin Joonyub Lee Taehoon Ko Kanghyuck Lee Yera Choi Hun-Sung Kim |
author_facet | Juyoung Shin Joonyub Lee Taehoon Ko Kanghyuck Lee Yera Choi Hun-Sung Kim |
author_sort | Juyoung Shin |
collection | DOAJ |
description | The early prediction of diabetes can facilitate interventions to prevent or delay it. This study proposes a diabetes prediction model based on machine learning (ML) to encourage individuals at risk of diabetes to employ healthy interventions. A total of 38,379 subjects were included. We trained the model on 80% of the subjects and verified its predictive performance on the remaining 20%. Furthermore, the performances of several algorithms were compared, including logistic regression, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), Cox regression, and XGBoost Survival Embedding (XGBSE). The area under the receiver operating characteristic curve (AUROC) of the XGBoost model was the largest, followed by those of the decision tree, logistic regression, and random forest models. For the survival analysis, XGBSE yielded an AUROC exceeding 0.9 for the 2- to 9-year predictions and a C-index of 0.934, while the Cox regression achieved a C-index of 0.921. After lowering the threshold from 0.5 to 0.25, the sensitivity increased from 0.011 to 0.236 for the 2-year prediction model and from 0.607 to 0.994 for the 9-year prediction model, while the specificity showed negligible changes. We developed a high-performance diabetes prediction model that applied the XGBSE algorithm with threshold adjustment. We plan to use this prediction model in real clinical practice for diabetes prevention after simplifying and validating it externally. |
first_indexed | 2024-03-09T18:13:22Z |
format | Article |
id | doaj.art-0052be4d10ee4fa7999d880cf640e05c |
institution | Directory Open Access Journal |
issn | 2075-4426 |
language | English |
last_indexed | 2024-03-09T18:13:22Z |
publishDate | 2022-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Journal of Personalized Medicine |
spelling | doaj.art-0052be4d10ee4fa7999d880cf640e05c2023-11-24T08:54:08ZengMDPI AGJournal of Personalized Medicine2075-44262022-11-011211189910.3390/jpm12111899Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical EffectivenessJuyoung Shin0Joonyub Lee1Taehoon Ko2Kanghyuck Lee3Yera Choi4Hun-Sung Kim5Health Promotion Center, Seoul St. Mary’s Hospital, Seoul 06591, KoreaDivision of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, KoreaDepartment of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, KoreaDepartment of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, KoreaNAVER CLOVA AI Lab, Seongnam 13561, KoreaDivision of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, KoreaThe early prediction of diabetes can facilitate interventions to prevent or delay it. This study proposes a diabetes prediction model based on machine learning (ML) to encourage individuals at risk of diabetes to employ healthy interventions. A total of 38,379 subjects were included. We trained the model on 80% of the subjects and verified its predictive performance on the remaining 20%. Furthermore, the performances of several algorithms were compared, including logistic regression, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), Cox regression, and XGBoost Survival Embedding (XGBSE). The area under the receiver operating characteristic curve (AUROC) of the XGBoost model was the largest, followed by those of the decision tree, logistic regression, and random forest models. For the survival analysis, XGBSE yielded an AUROC exceeding 0.9 for the 2- to 9-year predictions and a C-index of 0.934, while the Cox regression achieved a C-index of 0.921. After lowering the threshold from 0.5 to 0.25, the sensitivity increased from 0.011 to 0.236 for the 2-year prediction model and from 0.607 to 0.994 for the 9-year prediction model, while the specificity showed negligible changes. We developed a high-performance diabetes prediction model that applied the XGBSE algorithm with threshold adjustment. We plan to use this prediction model in real clinical practice for diabetes prevention after simplifying and validating it externally.https://www.mdpi.com/2075-4426/12/11/1899diabetes prediction modeldiabetes preventiontype 2 diabetesXGBoost Survival Embedding |
spellingShingle | Juyoung Shin Joonyub Lee Taehoon Ko Kanghyuck Lee Yera Choi Hun-Sung Kim Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness Journal of Personalized Medicine diabetes prediction model diabetes prevention type 2 diabetes XGBoost Survival Embedding |
title | Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness |
title_full | Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness |
title_fullStr | Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness |
title_full_unstemmed | Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness |
title_short | Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness |
title_sort | improving machine learning diabetes prediction models for the utmost clinical effectiveness |
topic | diabetes prediction model diabetes prevention type 2 diabetes XGBoost Survival Embedding |
url | https://www.mdpi.com/2075-4426/12/11/1899 |
work_keys_str_mv | AT juyoungshin improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness AT joonyublee improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness AT taehoonko improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness AT kanghyucklee improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness AT yerachoi improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness AT hunsungkim improvingmachinelearningdiabetespredictionmodelsfortheutmostclinicaleffectiveness |