MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models

CIKM ’24, October 21–25, 2024, Boise, ID, USA

Bibliographic Details
Main Authors: Thao, Phan Nguyen Minh, Dao, Cong-Tinh, Wu, Chenwei, Wang, Jian-Zhe, Liu, Shun, Ding, Jun-En, Restrepo, David, Liu, Feng, Hung, Fang-Ming, Peng, Wen-Chih
Other Authors: Massachusetts Institute of Technology. Institute for Medical Engineering & Science
Format: Article
Language:English
Published: ACM|Proceedings of the 33rd ACM International Conference on Information and Knowledge Management 2024
Online Access:https://hdl.handle.net/1721.1/157546
_version_ 1824457905192042496
author Thao, Phan Nguyen Minh
Dao, Cong-Tinh
Wu, Chenwei
Wang, Jian-Zhe
Liu, Shun
Ding, Jun-En
Restrepo, David
Liu, Feng
Hung, Fang-Ming
Peng, Wen-Chih
author2 Massachusetts Institute of Technology. Institute for Medical Engineering & Science
author_facet Massachusetts Institute of Technology. Institute for Medical Engineering & Science
Thao, Phan Nguyen Minh
Dao, Cong-Tinh
Wu, Chenwei
Wang, Jian-Zhe
Liu, Shun
Ding, Jun-En
Restrepo, David
Liu, Feng
Hung, Fang-Ming
Peng, Wen-Chih
author_sort Thao, Phan Nguyen Minh
collection MIT
description CIKM ’24, October 21–25, 2024, Boise, ID, USA
first_indexed 2025-02-19T04:17:25Z
format Article
id mit-1721.1/157546
institution Massachusetts Institute of Technology
language English
last_indexed 2025-02-19T04:17:25Z
publishDate 2024
publisher ACM|Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
record_format dspace
spelling mit-1721.1/1575462025-02-13T19:45:44Z MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models Thao, Phan Nguyen Minh Dao, Cong-Tinh Wu, Chenwei Wang, Jian-Zhe Liu, Shun Ding, Jun-En Restrepo, David Liu, Feng Hung, Fang-Ming Peng, Wen-Chih Massachusetts Institute of Technology. Institute for Medical Engineering & Science CIKM ’24, October 21–25, 2024, Boise, ID, USA Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they either focus on a single modality or overlook the inter-modality interactions/redundancy. In this work, we propose MEDFuse, a Multimodal EHR Data Fusion framework that incorporates masked lab-test modeling and large language models (LLMs) to effectively integrate structured and unstructured medical data. MEDFuse leverages multimodal embeddings extracted from two sources: LLMs fine-tuned on free clinical text and masked tabular transformers trained on structured lab test results. We design a disentangled transformer module, optimized by a mutual information loss to 1) decouple modality-specific and modality-shared information and 2) extract useful joint representation from the noise and redundancy present in clinical notes. Through comprehensive validation on the public MIMIC-III dataset and the in-house FEMH dataset, MEDFuse demonstrates great potential in advancing clinical predictions, achieving over 90% F1 score in the 10-disease multi-label classification task. 2024-11-14T21:33:51Z 2024-11-14T21:33:51Z 2024-10-21 2024-11-01T07:46:35Z Article http://purl.org/eprint/type/ConferencePaper 979-8-4007-0436-9 https://hdl.handle.net/1721.1/157546 Thao, Phan Nguyen Minh, Dao, Cong-Tinh, Wu, Chenwei, Wang, Jian-Zhe, Liu, Shun et al. 2024. "MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models." PUBLISHER_POLICY en https://doi.org/10.1145/3627673.3679962 Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. The author(s) application/pdf ACM|Proceedings of the 33rd ACM International Conference on Information and Knowledge Management Association for Computing Machinery
spellingShingle Thao, Phan Nguyen Minh
Dao, Cong-Tinh
Wu, Chenwei
Wang, Jian-Zhe
Liu, Shun
Ding, Jun-En
Restrepo, David
Liu, Feng
Hung, Fang-Ming
Peng, Wen-Chih
MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models
title MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models
title_full MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models
title_fullStr MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models
title_full_unstemmed MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models
title_short MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models
title_sort medfuse multimodal ehr data fusion with masked lab test modeling and large language models
url https://hdl.handle.net/1721.1/157546
work_keys_str_mv AT thaophannguyenminh medfusemultimodalehrdatafusionwithmaskedlabtestmodelingandlargelanguagemodels
AT daocongtinh medfusemultimodalehrdatafusionwithmaskedlabtestmodelingandlargelanguagemodels
AT wuchenwei medfusemultimodalehrdatafusionwithmaskedlabtestmodelingandlargelanguagemodels
AT wangjianzhe medfusemultimodalehrdatafusionwithmaskedlabtestmodelingandlargelanguagemodels
AT liushun medfusemultimodalehrdatafusionwithmaskedlabtestmodelingandlargelanguagemodels
AT dingjunen medfusemultimodalehrdatafusionwithmaskedlabtestmodelingandlargelanguagemodels
AT restrepodavid medfusemultimodalehrdatafusionwithmaskedlabtestmodelingandlargelanguagemodels
AT liufeng medfusemultimodalehrdatafusionwithmaskedlabtestmodelingandlargelanguagemodels
AT hungfangming medfusemultimodalehrdatafusionwithmaskedlabtestmodelingandlargelanguagemodels
AT pengwenchih medfusemultimodalehrdatafusionwithmaskedlabtestmodelingandlargelanguagemodels