A descriptive study of random forest algorithm for predicting COVID-19 patients outcome

Background The outbreak of coronavirus disease 2019 (COVID-19) that occurred in Wuhan, China, has become a global public health threat. It is necessary to identify indicators that can be used as optimal predictors for clinical outcomes of COVID-19 patients. Methods The clinical information from 126...

Full description

Bibliographic Details
Main Authors: Jie Wang, Heping Yu, Qingquan Hua, Shuili Jing, Zhifen Liu, Xiang Peng, Cheng’an Cao, Yongwen Luo
Format: Article
Language:English
Published: PeerJ Inc. 2020-09-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/9945.pdf
_version_ 1827606557649534976
author Jie Wang
Heping Yu
Qingquan Hua
Shuili Jing
Zhifen Liu
Xiang Peng
Cheng’an Cao
Yongwen Luo
author_facet Jie Wang
Heping Yu
Qingquan Hua
Shuili Jing
Zhifen Liu
Xiang Peng
Cheng’an Cao
Yongwen Luo
author_sort Jie Wang
collection DOAJ
description Background The outbreak of coronavirus disease 2019 (COVID-19) that occurred in Wuhan, China, has become a global public health threat. It is necessary to identify indicators that can be used as optimal predictors for clinical outcomes of COVID-19 patients. Methods The clinical information from 126 patients diagnosed with COVID-19 were collected from Wuhan Fourth Hospital. Specific clinical characteristics, laboratory findings, treatments and clinical outcomes were analyzed from patients hospitalized for treatment from 1 February to 15 March 2020, and subsequently died or were discharged. A random forest (RF) algorithm was used to predict the prognoses of COVID-19 patients and identify the optimal diagnostic predictors for patients’ clinical prognoses. Results Seven of the 126 patients were excluded for losing endpoints, 103 of the remaining 119 patients were discharged (alive) and 16 died in the hospital. A synthetic minority over-sampling technique (SMOTE) was used to correct the imbalanced distribution of clinical patients. Recursive feature elimination (RFE) was used to select the optimal subset for analysis. Eleven clinical parameters, Myo, CD8, age, LDH, LMR, CD45, Th/Ts, dyspnea, NLR, D-Dimer and CK were chosen with AUC approximately 0.9905. The RF algorithm was built to predict the prognoses of COVID-19 patients based on the best subset, and the area under the ROC curve (AUC) of the test data was 100%. Moreover, two optimal clinical risk predictors, lactate dehydrogenase (LDH) and Myoglobin (Myo), were selected based on the Gini index. The univariable logistic analysis revealed a substantial increase in the risk for in-hospital mortality when Myo was higher than 80 ng/ml (OR = 7.54, 95% CI [3.42–16.63]) and LDH was higher than 500 U/L (OR = 4.90, 95% CI [2.13–11.25]). Conclusion We applied an RF algorithm to predict the mortality of COVID-19 patients with high accuracy and identified LDH higher than 500 U/L and Myo higher than 80 ng/ml to be potential risk factors for the prognoses of COVID-19 patients in the early stage of the disease.
first_indexed 2024-03-09T06:39:01Z
format Article
id doaj.art-30ec0583675a4824998d0492dfbc9002
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:39:01Z
publishDate 2020-09-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-30ec0583675a4824998d0492dfbc90022023-12-03T10:53:48ZengPeerJ Inc.PeerJ2167-83592020-09-018e994510.7717/peerj.9945A descriptive study of random forest algorithm for predicting COVID-19 patients outcomeJie Wang0Heping Yu1Qingquan Hua2Shuili Jing3Zhifen Liu4Xiang Peng5Cheng’an Cao6Yongwen Luo7Department of Otolaryngology-Head and Neck Surgery, Renmin Hospital of Wuhan University, Wuhan, Hubei, ChinaDepartment of Nail and Breast Surgery, Wuhan Forth Hospital, Wuhan, Hubei, ChinaDepartment of Otolaryngology-Head and Neck Surgery, Renmin Hospital of Wuhan University, Wuhan, Hubei, ChinaDepartment of Otolaryngology-Head and Neck Surgery, Renmin Hospital of Wuhan University, Wuhan, Hubei, ChinaDepartment of Nephrology, Wuhan Forth Hospital, Wuhan, Hubei, ChinaDepartment of Neurosurgery, Wuhan Forth Hospital, Wuhan, Hubei, ChinaDepartment of Neurosurgery, Wuhan Forth Hospital, Wuhan, Hubei, ChinaDepartment of Urology, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, ChinaBackground The outbreak of coronavirus disease 2019 (COVID-19) that occurred in Wuhan, China, has become a global public health threat. It is necessary to identify indicators that can be used as optimal predictors for clinical outcomes of COVID-19 patients. Methods The clinical information from 126 patients diagnosed with COVID-19 were collected from Wuhan Fourth Hospital. Specific clinical characteristics, laboratory findings, treatments and clinical outcomes were analyzed from patients hospitalized for treatment from 1 February to 15 March 2020, and subsequently died or were discharged. A random forest (RF) algorithm was used to predict the prognoses of COVID-19 patients and identify the optimal diagnostic predictors for patients’ clinical prognoses. Results Seven of the 126 patients were excluded for losing endpoints, 103 of the remaining 119 patients were discharged (alive) and 16 died in the hospital. A synthetic minority over-sampling technique (SMOTE) was used to correct the imbalanced distribution of clinical patients. Recursive feature elimination (RFE) was used to select the optimal subset for analysis. Eleven clinical parameters, Myo, CD8, age, LDH, LMR, CD45, Th/Ts, dyspnea, NLR, D-Dimer and CK were chosen with AUC approximately 0.9905. The RF algorithm was built to predict the prognoses of COVID-19 patients based on the best subset, and the area under the ROC curve (AUC) of the test data was 100%. Moreover, two optimal clinical risk predictors, lactate dehydrogenase (LDH) and Myoglobin (Myo), were selected based on the Gini index. The univariable logistic analysis revealed a substantial increase in the risk for in-hospital mortality when Myo was higher than 80 ng/ml (OR = 7.54, 95% CI [3.42–16.63]) and LDH was higher than 500 U/L (OR = 4.90, 95% CI [2.13–11.25]). Conclusion We applied an RF algorithm to predict the mortality of COVID-19 patients with high accuracy and identified LDH higher than 500 U/L and Myo higher than 80 ng/ml to be potential risk factors for the prognoses of COVID-19 patients in the early stage of the disease.https://peerj.com/articles/9945.pdfCOVID-19Patient outcomeDescriptive studyRandom forest algorithm
spellingShingle Jie Wang
Heping Yu
Qingquan Hua
Shuili Jing
Zhifen Liu
Xiang Peng
Cheng’an Cao
Yongwen Luo
A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
PeerJ
COVID-19
Patient outcome
Descriptive study
Random forest algorithm
title A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
title_full A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
title_fullStr A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
title_full_unstemmed A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
title_short A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
title_sort descriptive study of random forest algorithm for predicting covid 19 patients outcome
topic COVID-19
Patient outcome
Descriptive study
Random forest algorithm
url https://peerj.com/articles/9945.pdf
work_keys_str_mv AT jiewang adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT hepingyu adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT qingquanhua adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT shuilijing adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT zhifenliu adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT xiangpeng adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT chengancao adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT yongwenluo adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT jiewang descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT hepingyu descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT qingquanhua descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT shuilijing descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT zhifenliu descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT xiangpeng descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT chengancao descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT yongwenluo descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome