Inferring multimodal latent topics from electronic health records

© 2020, The Author(s). Electronic health records (EHR) are rich heterogeneous collections of patient health information, whose broad adoption provides clinicians and researchers unprecedented opportunities for health informatics, disease-risk prediction, actionable clinical recommendations, and prec...

Full description

Bibliographic Details
Main Authors: Li, Yue, Nair, Pratheeksha, Lu, Xing Han, Wen, Zhi, Wang, Yuening, Dehaghi, Amir Ardalan Kalantari, Miao, Yan, Liu, Weiqi, Ordog, Tamas, Biernacka, Joanna M, Ryu, Euijung, Olson, Janet E, Frye, Mark A, Liu, Aihua, Guo, Liming, Marelli, Ariane, Ahuja, Yuri, Davila-Velderrain, Jose, Kellis, Manolis
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: Springer Science and Business Media LLC 2021
Online Access:https://hdl.handle.net/1721.1/136021
_version_ 1811088567273783296
author Li, Yue
Nair, Pratheeksha
Lu, Xing Han
Wen, Zhi
Wang, Yuening
Dehaghi, Amir Ardalan Kalantari
Miao, Yan
Liu, Weiqi
Ordog, Tamas
Biernacka, Joanna M
Ryu, Euijung
Olson, Janet E
Frye, Mark A
Liu, Aihua
Guo, Liming
Marelli, Ariane
Ahuja, Yuri
Davila-Velderrain, Jose
Kellis, Manolis
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Li, Yue
Nair, Pratheeksha
Lu, Xing Han
Wen, Zhi
Wang, Yuening
Dehaghi, Amir Ardalan Kalantari
Miao, Yan
Liu, Weiqi
Ordog, Tamas
Biernacka, Joanna M
Ryu, Euijung
Olson, Janet E
Frye, Mark A
Liu, Aihua
Guo, Liming
Marelli, Ariane
Ahuja, Yuri
Davila-Velderrain, Jose
Kellis, Manolis
author_sort Li, Yue
collection MIT
description © 2020, The Author(s). Electronic health records (EHR) are rich heterogeneous collections of patient health information, whose broad adoption provides clinicians and researchers unprecedented opportunities for health informatics, disease-risk prediction, actionable clinical recommendations, and precision medicine. However, EHRs present several modeling challenges, including highly sparse data matrices, noisy irregular clinical notes, arbitrary biases in billing code assignment, diagnosis-driven lab tests, and heterogeneous data types. To address these challenges, we present MixEHR, a multi-view Bayesian topic model. We demonstrate MixEHR on MIMIC-III, Mayo Clinic Bipolar Disorder, and Quebec Congenital Heart Disease EHR datasets. Qualitatively, MixEHR disease topics reveal meaningful combinations of clinical features across heterogeneous data types. Quantitatively, we observe superior prediction accuracy of diagnostic codes and lab test imputations compared to the state-of-art methods. We leverage the inferred patient topic mixtures to classify target diseases and predict mortality of patients in critical conditions. In all comparison, MixEHR confers competitive performance and reveals meaningful disease-related topics.
first_indexed 2024-09-23T14:04:04Z
format Article
id mit-1721.1/136021
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T14:04:04Z
publishDate 2021
publisher Springer Science and Business Media LLC
record_format dspace
spelling mit-1721.1/1360212023-12-22T18:46:25Z Inferring multimodal latent topics from electronic health records Li, Yue Nair, Pratheeksha Lu, Xing Han Wen, Zhi Wang, Yuening Dehaghi, Amir Ardalan Kalantari Miao, Yan Liu, Weiqi Ordog, Tamas Biernacka, Joanna M Ryu, Euijung Olson, Janet E Frye, Mark A Liu, Aihua Guo, Liming Marelli, Ariane Ahuja, Yuri Davila-Velderrain, Jose Kellis, Manolis Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2020, The Author(s). Electronic health records (EHR) are rich heterogeneous collections of patient health information, whose broad adoption provides clinicians and researchers unprecedented opportunities for health informatics, disease-risk prediction, actionable clinical recommendations, and precision medicine. However, EHRs present several modeling challenges, including highly sparse data matrices, noisy irregular clinical notes, arbitrary biases in billing code assignment, diagnosis-driven lab tests, and heterogeneous data types. To address these challenges, we present MixEHR, a multi-view Bayesian topic model. We demonstrate MixEHR on MIMIC-III, Mayo Clinic Bipolar Disorder, and Quebec Congenital Heart Disease EHR datasets. Qualitatively, MixEHR disease topics reveal meaningful combinations of clinical features across heterogeneous data types. Quantitatively, we observe superior prediction accuracy of diagnostic codes and lab test imputations compared to the state-of-art methods. We leverage the inferred patient topic mixtures to classify target diseases and predict mortality of patients in critical conditions. In all comparison, MixEHR confers competitive performance and reveals meaningful disease-related topics. 2021-10-27T20:30:26Z 2021-10-27T20:30:26Z 2020 2021-01-05T19:26:58Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/136021 en 10.1038/S41467-020-16378-3 Nature Communications Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/ application/pdf Springer Science and Business Media LLC Nature
spellingShingle Li, Yue
Nair, Pratheeksha
Lu, Xing Han
Wen, Zhi
Wang, Yuening
Dehaghi, Amir Ardalan Kalantari
Miao, Yan
Liu, Weiqi
Ordog, Tamas
Biernacka, Joanna M
Ryu, Euijung
Olson, Janet E
Frye, Mark A
Liu, Aihua
Guo, Liming
Marelli, Ariane
Ahuja, Yuri
Davila-Velderrain, Jose
Kellis, Manolis
Inferring multimodal latent topics from electronic health records
title Inferring multimodal latent topics from electronic health records
title_full Inferring multimodal latent topics from electronic health records
title_fullStr Inferring multimodal latent topics from electronic health records
title_full_unstemmed Inferring multimodal latent topics from electronic health records
title_short Inferring multimodal latent topics from electronic health records
title_sort inferring multimodal latent topics from electronic health records
url https://hdl.handle.net/1721.1/136021
work_keys_str_mv AT liyue inferringmultimodallatenttopicsfromelectronichealthrecords
AT nairpratheeksha inferringmultimodallatenttopicsfromelectronichealthrecords
AT luxinghan inferringmultimodallatenttopicsfromelectronichealthrecords
AT wenzhi inferringmultimodallatenttopicsfromelectronichealthrecords
AT wangyuening inferringmultimodallatenttopicsfromelectronichealthrecords
AT dehaghiamirardalankalantari inferringmultimodallatenttopicsfromelectronichealthrecords
AT miaoyan inferringmultimodallatenttopicsfromelectronichealthrecords
AT liuweiqi inferringmultimodallatenttopicsfromelectronichealthrecords
AT ordogtamas inferringmultimodallatenttopicsfromelectronichealthrecords
AT biernackajoannam inferringmultimodallatenttopicsfromelectronichealthrecords
AT ryueuijung inferringmultimodallatenttopicsfromelectronichealthrecords
AT olsonjanete inferringmultimodallatenttopicsfromelectronichealthrecords
AT fryemarka inferringmultimodallatenttopicsfromelectronichealthrecords
AT liuaihua inferringmultimodallatenttopicsfromelectronichealthrecords
AT guoliming inferringmultimodallatenttopicsfromelectronichealthrecords
AT marelliariane inferringmultimodallatenttopicsfromelectronichealthrecords
AT ahujayuri inferringmultimodallatenttopicsfromelectronichealthrecords
AT davilavelderrainjose inferringmultimodallatenttopicsfromelectronichealthrecords
AT kellismanolis inferringmultimodallatenttopicsfromelectronichealthrecords