TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records

Abstract Deep learning transformer-based models using longitudinal electronic health records (EHRs) have shown a great success in prediction of clinical diseases or outcomes. Pretraining on a large dataset can help such models map the input space better and boost their performance on relevant tasks...

Full description

Bibliographic Details
Main Authors: Zhichao Yang, Avijit Mitra, Weisong Liu, Dan Berlowitz, Hong Yu
Format: Article
Language:English
Published: Nature Portfolio 2023-11-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-023-43715-z
_version_ 1797414693847957504
author Zhichao Yang
Avijit Mitra
Weisong Liu
Dan Berlowitz
Hong Yu
author_facet Zhichao Yang
Avijit Mitra
Weisong Liu
Dan Berlowitz
Hong Yu
author_sort Zhichao Yang
collection DOAJ
description Abstract Deep learning transformer-based models using longitudinal electronic health records (EHRs) have shown a great success in prediction of clinical diseases or outcomes. Pretraining on a large dataset can help such models map the input space better and boost their performance on relevant tasks through finetuning with limited data. In this study, we present TransformEHR, a generative encoder-decoder model with transformer that is pretrained using a new pretraining objective—predicting all diseases and outcomes of a patient at a future visit from previous visits. TransformEHR’s encoder-decoder framework, paired with the novel pretraining objective, helps it achieve the new state-of-the-art performance on multiple clinical prediction tasks. Comparing with the previous model, TransformEHR improves area under the precision–recall curve by 2% (p < 0.001) for pancreatic cancer onset and by 24% (p = 0.007) for intentional self-harm in patients with post-traumatic stress disorder. The high performance in predicting intentional self-harm shows the potential of TransformEHR in building effective clinical intervention systems. TransformEHR is also generalizable and can be easily finetuned for clinical prediction tasks with limited data.
first_indexed 2024-03-09T05:37:16Z
format Article
id doaj.art-e86f3f2a50684c45880702f0abadcb09
institution Directory Open Access Journal
issn 2041-1723
language English
last_indexed 2024-03-09T05:37:16Z
publishDate 2023-11-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj.art-e86f3f2a50684c45880702f0abadcb092023-12-03T12:28:03ZengNature PortfolioNature Communications2041-17232023-11-0114111010.1038/s41467-023-43715-zTransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health recordsZhichao Yang0Avijit Mitra1Weisong Liu2Dan Berlowitz3Hong Yu4College of Information and Computer Science, University of Massachusetts AmherstCollege of Information and Computer Science, University of Massachusetts AmherstSchool of Computer & Information Sciences, University of Massachusetts LowellCenter for Healthcare Organization and Implementation Research, VA Bedford Health Care SystemCollege of Information and Computer Science, University of Massachusetts AmherstAbstract Deep learning transformer-based models using longitudinal electronic health records (EHRs) have shown a great success in prediction of clinical diseases or outcomes. Pretraining on a large dataset can help such models map the input space better and boost their performance on relevant tasks through finetuning with limited data. In this study, we present TransformEHR, a generative encoder-decoder model with transformer that is pretrained using a new pretraining objective—predicting all diseases and outcomes of a patient at a future visit from previous visits. TransformEHR’s encoder-decoder framework, paired with the novel pretraining objective, helps it achieve the new state-of-the-art performance on multiple clinical prediction tasks. Comparing with the previous model, TransformEHR improves area under the precision–recall curve by 2% (p < 0.001) for pancreatic cancer onset and by 24% (p = 0.007) for intentional self-harm in patients with post-traumatic stress disorder. The high performance in predicting intentional self-harm shows the potential of TransformEHR in building effective clinical intervention systems. TransformEHR is also generalizable and can be easily finetuned for clinical prediction tasks with limited data.https://doi.org/10.1038/s41467-023-43715-z
spellingShingle Zhichao Yang
Avijit Mitra
Weisong Liu
Dan Berlowitz
Hong Yu
TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records
Nature Communications
title TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records
title_full TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records
title_fullStr TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records
title_full_unstemmed TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records
title_short TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records
title_sort transformehr transformer based encoder decoder generative model to enhance prediction of disease outcomes using electronic health records
url https://doi.org/10.1038/s41467-023-43715-z
work_keys_str_mv AT zhichaoyang transformehrtransformerbasedencoderdecodergenerativemodeltoenhancepredictionofdiseaseoutcomesusingelectronichealthrecords
AT avijitmitra transformehrtransformerbasedencoderdecodergenerativemodeltoenhancepredictionofdiseaseoutcomesusingelectronichealthrecords
AT weisongliu transformehrtransformerbasedencoderdecodergenerativemodeltoenhancepredictionofdiseaseoutcomesusingelectronichealthrecords
AT danberlowitz transformehrtransformerbasedencoderdecodergenerativemodeltoenhancepredictionofdiseaseoutcomesusingelectronichealthrecords
AT hongyu transformehrtransformerbasedencoderdecodergenerativemodeltoenhancepredictionofdiseaseoutcomesusingelectronichealthrecords