The prediction of hospital length of stay using unstructured data

Abstract Objective This study aimed to assess the performance improvement for machine learning-based hospital length of stay (LOS) predictions when clinical signs written in text are accounted for and compared to the traditional approach of solely considering structured information such as age, gend...

Full description

Bibliographic Details
Main Authors:	Jan Chrusciel, François Girardon, Lucien Roquette, David Laplanche, Antoine Duclos, Stéphane Sanchez
Format:	Article
Language:	English
Published:	BMC 2021-12-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Emergency department Length of stay Data mining Health services research
Online Access:	https://doi.org/10.1186/s12911-021-01722-4

_version_	1819009740928188416
author	Jan Chrusciel François Girardon Lucien Roquette David Laplanche Antoine Duclos Stéphane Sanchez
author_facet	Jan Chrusciel François Girardon Lucien Roquette David Laplanche Antoine Duclos Stéphane Sanchez
author_sort	Jan Chrusciel
collection	DOAJ
description	Abstract Objective This study aimed to assess the performance improvement for machine learning-based hospital length of stay (LOS) predictions when clinical signs written in text are accounted for and compared to the traditional approach of solely considering structured information such as age, gender and major ICD diagnosis. Methods This study was an observational retrospective cohort study and analyzed patient stays admitted between 1 January to 24 September 2019. For each stay, a patient was admitted through the Emergency Department (ED) and stayed for more than two days in the subsequent service. LOS was predicted using two random forest models. The first included unstructured text extracted from electronic health records (EHRs). A word-embedding algorithm based on UMLS terminology with exact matching restricted to patient-centric affirmation sentences was used to assess the EHR data. The second model was primarily based on structured data in the form of diagnoses coded from the International Classification of Disease 10th Edition (ICD-10) and triage codes (CCMU/GEMSA classifications). Variables common to both models were: age, gender, zip/postal code, LOS in the ED, recent visit flag, assigned patient ward after the ED stay and short-term ED activity. Models were trained on 80% of data and performance was evaluated by accuracy on the remaining 20% test data. Results The model using unstructured data had a 75.0% accuracy compared to 74.1% for the model containing structured data. The two models produced a similar prediction in 86.6% of cases. In a secondary analysis restricted to intensive care patients, the accuracy of both models was also similar (76.3% vs 75.0%). Conclusions LOS prediction using unstructured data had similar accuracy to using structured data and can be considered of use to accurately model LOS.
first_indexed	2024-12-21T01:01:11Z
format	Article
id	doaj.art-842777e50b814280ab9771feeb090a79
institution	Directory Open Access Journal
issn	1472-6947
language	English
last_indexed	2024-12-21T01:01:11Z
publishDate	2021-12-01
publisher	BMC
record_format	Article
series	BMC Medical Informatics and Decision Making
spelling	doaj.art-842777e50b814280ab9771feeb090a792022-12-21T19:21:10ZengBMCBMC Medical Informatics and Decision Making1472-69472021-12-012111910.1186/s12911-021-01722-4The prediction of hospital length of stay using unstructured dataJan Chrusciel0François Girardon1Lucien Roquette2David Laplanche3Antoine Duclos4Stéphane Sanchez5Pôle Territorial Santé Publique et Performance, Centre Hospitalier de TroyesResearch and Consulting, CODOC SASResearch and Consulting, CODOC SASPôle Territorial Santé Publique et Performance, Centre Hospitalier de TroyesResearch on Healthcare Performance Lab, INSERM U1290 RESHAPE, Université Claude Bernard Lyon 1Pôle Territorial Santé Publique et Performance, Centre Hospitalier de TroyesAbstract Objective This study aimed to assess the performance improvement for machine learning-based hospital length of stay (LOS) predictions when clinical signs written in text are accounted for and compared to the traditional approach of solely considering structured information such as age, gender and major ICD diagnosis. Methods This study was an observational retrospective cohort study and analyzed patient stays admitted between 1 January to 24 September 2019. For each stay, a patient was admitted through the Emergency Department (ED) and stayed for more than two days in the subsequent service. LOS was predicted using two random forest models. The first included unstructured text extracted from electronic health records (EHRs). A word-embedding algorithm based on UMLS terminology with exact matching restricted to patient-centric affirmation sentences was used to assess the EHR data. The second model was primarily based on structured data in the form of diagnoses coded from the International Classification of Disease 10th Edition (ICD-10) and triage codes (CCMU/GEMSA classifications). Variables common to both models were: age, gender, zip/postal code, LOS in the ED, recent visit flag, assigned patient ward after the ED stay and short-term ED activity. Models were trained on 80% of data and performance was evaluated by accuracy on the remaining 20% test data. Results The model using unstructured data had a 75.0% accuracy compared to 74.1% for the model containing structured data. The two models produced a similar prediction in 86.6% of cases. In a secondary analysis restricted to intensive care patients, the accuracy of both models was also similar (76.3% vs 75.0%). Conclusions LOS prediction using unstructured data had similar accuracy to using structured data and can be considered of use to accurately model LOS.https://doi.org/10.1186/s12911-021-01722-4Emergency departmentLength of stayData miningHealth services research
spellingShingle	Jan Chrusciel François Girardon Lucien Roquette David Laplanche Antoine Duclos Stéphane Sanchez The prediction of hospital length of stay using unstructured data BMC Medical Informatics and Decision Making Emergency department Length of stay Data mining Health services research
title	The prediction of hospital length of stay using unstructured data
title_full	The prediction of hospital length of stay using unstructured data
title_fullStr	The prediction of hospital length of stay using unstructured data
title_full_unstemmed	The prediction of hospital length of stay using unstructured data
title_short	The prediction of hospital length of stay using unstructured data
title_sort	prediction of hospital length of stay using unstructured data
topic	Emergency department Length of stay Data mining Health services research
url	https://doi.org/10.1186/s12911-021-01722-4
work_keys_str_mv	AT janchrusciel thepredictionofhospitallengthofstayusingunstructureddata AT francoisgirardon thepredictionofhospitallengthofstayusingunstructureddata AT lucienroquette thepredictionofhospitallengthofstayusingunstructureddata AT davidlaplanche thepredictionofhospitallengthofstayusingunstructureddata AT antoineduclos thepredictionofhospitallengthofstayusingunstructureddata AT stephanesanchez thepredictionofhospitallengthofstayusingunstructureddata AT janchrusciel predictionofhospitallengthofstayusingunstructureddata AT francoisgirardon predictionofhospitallengthofstayusingunstructureddata AT lucienroquette predictionofhospitallengthofstayusingunstructureddata AT davidlaplanche predictionofhospitallengthofstayusingunstructureddata AT antoineduclos predictionofhospitallengthofstayusingunstructureddata AT stephanesanchez predictionofhospitallengthofstayusingunstructureddata

The prediction of hospital length of stay using unstructured data

Similar Items