Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data

Abstract Background In the early stages of the COVID-19 pandemic our institution was interested in forecasting how long surgical patients receiving elective procedures would spend in the hospital. Initial examination of our models indicated that, due to the skewed nature of the length of stay, accur...

Full description

Bibliographic Details
Main Authors:	Zhenhui Xu, Congwen Zhao, Charles D. Scales, Ricardo Henao, Benjamin A. Goldstein
Format:	Article
Language:	English
Published:	BMC 2022-04-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Electronic health records Machine learning Clinical decision support Surgical outcomes
Online Access:	https://doi.org/10.1186/s12911-022-01855-0

_version_	1811338795904139264
author	Zhenhui Xu Congwen Zhao Charles D. Scales Ricardo Henao Benjamin A. Goldstein
author_facet	Zhenhui Xu Congwen Zhao Charles D. Scales Ricardo Henao Benjamin A. Goldstein
author_sort	Zhenhui Xu
collection	DOAJ
description	Abstract Background In the early stages of the COVID-19 pandemic our institution was interested in forecasting how long surgical patients receiving elective procedures would spend in the hospital. Initial examination of our models indicated that, due to the skewed nature of the length of stay, accurate prediction was challenging and we instead opted for a simpler classification model. In this work we perform a deeper examination of predicting in-hospital length of stay. Methods We used electronic health record data on length of stay from 42,209 elective surgeries. We compare different loss-functions (mean squared error, mean absolute error, mean relative error), algorithms (LASSO, Random Forests, multilayer perceptron) and data transformations (log and truncation). We also assess the performance of two stage hybrid classification-regression approach. Results Our results show that while it is possible to accurately predict short length of stays, predicting longer length of stay is extremely challenging. As such, we opt for a two-stage model that first classifies patients into long versus short length of stays and then a second stage that fits a regresssor among those predicted to have a short length of stay. Discussion The results indicate both the challenges and considerations necessary to applying machine-learning methods to skewed outcomes. Conclusions Two-stage models allow those developing clinical decision support tools to explicitly acknowledge where they can and cannot make accurate predictions.
first_indexed	2024-04-13T18:16:51Z
format	Article
id	doaj.art-30ddaaf267fb48a383780783ddf0c5be
institution	Directory Open Access Journal
issn	1472-6947
language	English
last_indexed	2024-04-13T18:16:51Z
publishDate	2022-04-01
publisher	BMC
record_format	Article
series	BMC Medical Informatics and Decision Making
spelling	doaj.art-30ddaaf267fb48a383780783ddf0c5be2022-12-22T02:35:39ZengBMCBMC Medical Informatics and Decision Making1472-69472022-04-0122111210.1186/s12911-022-01855-0Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed dataZhenhui Xu0Congwen Zhao1Charles D. Scales2Ricardo Henao3Benjamin A. Goldstein4Department of Biostatistics and Bioinformatics, Duke UniversityDepartment of Biostatistics and Bioinformatics, Duke UniversityDuke Clinical Research Institute, Duke UniversityDepartment of Biostatistics and Bioinformatics, Duke UniversityDepartment of Biostatistics and Bioinformatics, Duke UniversityAbstract Background In the early stages of the COVID-19 pandemic our institution was interested in forecasting how long surgical patients receiving elective procedures would spend in the hospital. Initial examination of our models indicated that, due to the skewed nature of the length of stay, accurate prediction was challenging and we instead opted for a simpler classification model. In this work we perform a deeper examination of predicting in-hospital length of stay. Methods We used electronic health record data on length of stay from 42,209 elective surgeries. We compare different loss-functions (mean squared error, mean absolute error, mean relative error), algorithms (LASSO, Random Forests, multilayer perceptron) and data transformations (log and truncation). We also assess the performance of two stage hybrid classification-regression approach. Results Our results show that while it is possible to accurately predict short length of stays, predicting longer length of stay is extremely challenging. As such, we opt for a two-stage model that first classifies patients into long versus short length of stays and then a second stage that fits a regresssor among those predicted to have a short length of stay. Discussion The results indicate both the challenges and considerations necessary to applying machine-learning methods to skewed outcomes. Conclusions Two-stage models allow those developing clinical decision support tools to explicitly acknowledge where they can and cannot make accurate predictions.https://doi.org/10.1186/s12911-022-01855-0Electronic health recordsMachine learningClinical decision supportSurgical outcomes
spellingShingle	Zhenhui Xu Congwen Zhao Charles D. Scales Ricardo Henao Benjamin A. Goldstein Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data BMC Medical Informatics and Decision Making Electronic health records Machine learning Clinical decision support Surgical outcomes
title	Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data
title_full	Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data
title_fullStr	Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data
title_full_unstemmed	Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data
title_short	Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data
title_sort	predicting in hospital length of stay a two stage modeling approach to account for highly skewed data
topic	Electronic health records Machine learning Clinical decision support Surgical outcomes
url	https://doi.org/10.1186/s12911-022-01855-0
work_keys_str_mv	AT zhenhuixu predictinginhospitallengthofstayatwostagemodelingapproachtoaccountforhighlyskeweddata AT congwenzhao predictinginhospitallengthofstayatwostagemodelingapproachtoaccountforhighlyskeweddata AT charlesdscales predictinginhospitallengthofstayatwostagemodelingapproachtoaccountforhighlyskeweddata AT ricardohenao predictinginhospitallengthofstayatwostagemodelingapproachtoaccountforhighlyskeweddata AT benjaminagoldstein predictinginhospitallengthofstayatwostagemodelingapproachtoaccountforhighlyskeweddata

Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data

Similar Items