Machine learning models to predict in-hospital mortality in septic patients with diabetes

BackgroundSepsis is a leading cause of morbidity and mortality in hospitalized patients. Up to now, there are no well-established longitudinal networks from molecular mechanisms to clinical phenotypes in sepsis. Adding to the problem, about one of the five patients presented with diabetes. For this...

Full description

Bibliographic Details
Main Authors: Jing Qi, Jingchao Lei, Nanyi Li, Dan Huang, Huaizheng Liu, Kefu Zhou, Zheren Dai, Chuanzheng Sun
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-11-01
Series:Frontiers in Endocrinology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fendo.2022.1034251/full
_version_ 1797988559563522048
author Jing Qi
Jingchao Lei
Nanyi Li
Dan Huang
Huaizheng Liu
Kefu Zhou
Zheren Dai
Chuanzheng Sun
author_facet Jing Qi
Jingchao Lei
Nanyi Li
Dan Huang
Huaizheng Liu
Kefu Zhou
Zheren Dai
Chuanzheng Sun
author_sort Jing Qi
collection DOAJ
description BackgroundSepsis is a leading cause of morbidity and mortality in hospitalized patients. Up to now, there are no well-established longitudinal networks from molecular mechanisms to clinical phenotypes in sepsis. Adding to the problem, about one of the five patients presented with diabetes. For this subgroup, management is difficult, and prognosis is difficult to evaluate.MethodsFrom the three databases, a total of 7,001 patients were enrolled on the basis of sepsis-3 standard and diabetes diagnosis. Input variable selection is based on the result of correlation analysis in a handpicking way, and 53 variables were left. A total of 5,727 records were collected from Medical Information Mart for Intensive Care database and randomly split into a training set and an internal validation set at a ratio of 7:3. Then, logistic regression with lasso regularization, Bayes logistic regression, decision tree, random forest, and XGBoost were conducted to build the predictive model by using training set. Then, the models were tested by the internal validation set. The data from eICU Collaborative Research Database (n = 815) and dtChina critical care database (n = 459) were used to test the model performance as the external validation set.ResultsIn the internal validation set, the accuracy values of logistic regression with lasso regularization, Bayes logistic regression, decision tree, random forest, and XGBoost were 0.878, 0.883, 0.865, 0.883, and 0.882, respectively. Likewise, in the external validation set 1, lasso regularization = 0.879, Bayes logistic regression = 0.877, decision tree = 0.865, random forest = 0.886, and XGBoost = 0.875. In the external validation set 2, lasso regularization = 0.715, Bayes logistic regression = 0.745, decision tree = 0.763, random forest = 0.760, and XGBoost = 0.699.ConclusionThe top three models for internal validation set were Bayes logistic regression, random forest, and XGBoost, whereas the top three models for external validation set 1 were random forest, logistic regression, and Bayes logistic regression. In addition, the top three models for the external validation set 2 were decision tree, random forest, and Bayes logistic regression. Random forest model performed well with the training and three validation sets. The most important features are age, albumin, and lactate.
first_indexed 2024-04-11T08:05:57Z
format Article
id doaj.art-6e5d0c2f27a5469a87817a1cbd09fa8e
institution Directory Open Access Journal
issn 1664-2392
language English
last_indexed 2024-04-11T08:05:57Z
publishDate 2022-11-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Endocrinology
spelling doaj.art-6e5d0c2f27a5469a87817a1cbd09fa8e2022-12-22T04:35:34ZengFrontiers Media S.A.Frontiers in Endocrinology1664-23922022-11-011310.3389/fendo.2022.10342511034251Machine learning models to predict in-hospital mortality in septic patients with diabetesJing QiJingchao LeiNanyi LiDan HuangHuaizheng LiuKefu ZhouZheren DaiChuanzheng SunBackgroundSepsis is a leading cause of morbidity and mortality in hospitalized patients. Up to now, there are no well-established longitudinal networks from molecular mechanisms to clinical phenotypes in sepsis. Adding to the problem, about one of the five patients presented with diabetes. For this subgroup, management is difficult, and prognosis is difficult to evaluate.MethodsFrom the three databases, a total of 7,001 patients were enrolled on the basis of sepsis-3 standard and diabetes diagnosis. Input variable selection is based on the result of correlation analysis in a handpicking way, and 53 variables were left. A total of 5,727 records were collected from Medical Information Mart for Intensive Care database and randomly split into a training set and an internal validation set at a ratio of 7:3. Then, logistic regression with lasso regularization, Bayes logistic regression, decision tree, random forest, and XGBoost were conducted to build the predictive model by using training set. Then, the models were tested by the internal validation set. The data from eICU Collaborative Research Database (n = 815) and dtChina critical care database (n = 459) were used to test the model performance as the external validation set.ResultsIn the internal validation set, the accuracy values of logistic regression with lasso regularization, Bayes logistic regression, decision tree, random forest, and XGBoost were 0.878, 0.883, 0.865, 0.883, and 0.882, respectively. Likewise, in the external validation set 1, lasso regularization = 0.879, Bayes logistic regression = 0.877, decision tree = 0.865, random forest = 0.886, and XGBoost = 0.875. In the external validation set 2, lasso regularization = 0.715, Bayes logistic regression = 0.745, decision tree = 0.763, random forest = 0.760, and XGBoost = 0.699.ConclusionThe top three models for internal validation set were Bayes logistic regression, random forest, and XGBoost, whereas the top three models for external validation set 1 were random forest, logistic regression, and Bayes logistic regression. In addition, the top three models for the external validation set 2 were decision tree, random forest, and Bayes logistic regression. Random forest model performed well with the training and three validation sets. The most important features are age, albumin, and lactate.https://www.frontiersin.org/articles/10.3389/fendo.2022.1034251/fullmachine learningsepsisdiabetesin-hospital mortalityrisk factors
spellingShingle Jing Qi
Jingchao Lei
Nanyi Li
Dan Huang
Huaizheng Liu
Kefu Zhou
Zheren Dai
Chuanzheng Sun
Machine learning models to predict in-hospital mortality in septic patients with diabetes
Frontiers in Endocrinology
machine learning
sepsis
diabetes
in-hospital mortality
risk factors
title Machine learning models to predict in-hospital mortality in septic patients with diabetes
title_full Machine learning models to predict in-hospital mortality in septic patients with diabetes
title_fullStr Machine learning models to predict in-hospital mortality in septic patients with diabetes
title_full_unstemmed Machine learning models to predict in-hospital mortality in septic patients with diabetes
title_short Machine learning models to predict in-hospital mortality in septic patients with diabetes
title_sort machine learning models to predict in hospital mortality in septic patients with diabetes
topic machine learning
sepsis
diabetes
in-hospital mortality
risk factors
url https://www.frontiersin.org/articles/10.3389/fendo.2022.1034251/full
work_keys_str_mv AT jingqi machinelearningmodelstopredictinhospitalmortalityinsepticpatientswithdiabetes
AT jingchaolei machinelearningmodelstopredictinhospitalmortalityinsepticpatientswithdiabetes
AT nanyili machinelearningmodelstopredictinhospitalmortalityinsepticpatientswithdiabetes
AT danhuang machinelearningmodelstopredictinhospitalmortalityinsepticpatientswithdiabetes
AT huaizhengliu machinelearningmodelstopredictinhospitalmortalityinsepticpatientswithdiabetes
AT kefuzhou machinelearningmodelstopredictinhospitalmortalityinsepticpatientswithdiabetes
AT zherendai machinelearningmodelstopredictinhospitalmortalityinsepticpatientswithdiabetes
AT chuanzhengsun machinelearningmodelstopredictinhospitalmortalityinsepticpatientswithdiabetes