Time-sensitive clinical concept embeddings learned from large electronic health records

Abstract Background Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a p...

Full description

Bibliographic Details
Main Authors: Yang Xiang, Jun Xu, Yuqi Si, Zhiheng Li, Laila Rasmy, Yujia Zhou, Firat Tiryaki, Fang Li, Yaoyun Zhang, Yonghui Wu, Xiaoqian Jiang, Wenjin Jim Zheng, Degui Zhi, Cui Tao, Hua Xu
Format: Article
Language:English
Published: BMC 2019-04-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-019-0766-3
_version_ 1818291641913442304
author Yang Xiang
Jun Xu
Yuqi Si
Zhiheng Li
Laila Rasmy
Yujia Zhou
Firat Tiryaki
Fang Li
Yaoyun Zhang
Yonghui Wu
Xiaoqian Jiang
Wenjin Jim Zheng
Degui Zhi
Cui Tao
Hua Xu
author_facet Yang Xiang
Jun Xu
Yuqi Si
Zhiheng Li
Laila Rasmy
Yujia Zhou
Firat Tiryaki
Fang Li
Yaoyun Zhang
Yonghui Wu
Xiaoqian Jiang
Wenjin Jim Zheng
Degui Zhi
Cui Tao
Hua Xu
author_sort Yang Xiang
collection DOAJ
description Abstract Background Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient’s records, which may lead to incorrect selection of contexts. Methods To address this issue, we extended three popular concept embedding learning methods: word2vec, positive pointwise mutual information (PPMI) and FastText, to consider time-sensitive information. We then trained them on a large electronic health records (EHR) database containing about 50 million patients to generate concept embeddings and evaluated them for both intrinsic evaluations focusing on concept similarity measure and an extrinsic evaluation to assess the use of generated concept embeddings in the task of predicting disease onset. Results Our experiments show that embeddings learned from information within one visit (time window zero) improve performance on the concept similarity measure and the FastText algorithm usually had better performance than the other two algorithms. For the predictive modeling task, the optimal result was achieved by word2vec embeddings with a 30-day sliding window. Conclusions Considering time constraints are important in training clinical concept embeddings. We expect they can benefit a series of downstream applications.
first_indexed 2024-12-13T02:47:18Z
format Article
id doaj.art-f44e156eeab3417ab168dbe42a247a7d
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-12-13T02:47:18Z
publishDate 2019-04-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-f44e156eeab3417ab168dbe42a247a7d2022-12-22T00:02:10ZengBMCBMC Medical Informatics and Decision Making1472-69472019-04-0119S213914810.1186/s12911-019-0766-3Time-sensitive clinical concept embeddings learned from large electronic health recordsYang Xiang0Jun Xu1Yuqi Si2Zhiheng Li3Laila Rasmy4Yujia Zhou5Firat Tiryaki6Fang Li7Yaoyun Zhang8Yonghui Wu9Xiaoqian Jiang10Wenjin Jim Zheng11Degui Zhi12Cui Tao13Hua Xu14School of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonDepartment of Health Outcomes & Biomedical Informatics, College of Medicine, University of FloridaSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonAbstract Background Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient’s records, which may lead to incorrect selection of contexts. Methods To address this issue, we extended three popular concept embedding learning methods: word2vec, positive pointwise mutual information (PPMI) and FastText, to consider time-sensitive information. We then trained them on a large electronic health records (EHR) database containing about 50 million patients to generate concept embeddings and evaluated them for both intrinsic evaluations focusing on concept similarity measure and an extrinsic evaluation to assess the use of generated concept embeddings in the task of predicting disease onset. Results Our experiments show that embeddings learned from information within one visit (time window zero) improve performance on the concept similarity measure and the FastText algorithm usually had better performance than the other two algorithms. For the predictive modeling task, the optimal result was achieved by word2vec embeddings with a 30-day sliding window. Conclusions Considering time constraints are important in training clinical concept embeddings. We expect they can benefit a series of downstream applications.http://link.springer.com/article/10.1186/s12911-019-0766-3Clinical concept embeddingDistributional representationTime sensitive concept embeddingElectronic medical recordsConcept similarityPredictive modeling
spellingShingle Yang Xiang
Jun Xu
Yuqi Si
Zhiheng Li
Laila Rasmy
Yujia Zhou
Firat Tiryaki
Fang Li
Yaoyun Zhang
Yonghui Wu
Xiaoqian Jiang
Wenjin Jim Zheng
Degui Zhi
Cui Tao
Hua Xu
Time-sensitive clinical concept embeddings learned from large electronic health records
BMC Medical Informatics and Decision Making
Clinical concept embedding
Distributional representation
Time sensitive concept embedding
Electronic medical records
Concept similarity
Predictive modeling
title Time-sensitive clinical concept embeddings learned from large electronic health records
title_full Time-sensitive clinical concept embeddings learned from large electronic health records
title_fullStr Time-sensitive clinical concept embeddings learned from large electronic health records
title_full_unstemmed Time-sensitive clinical concept embeddings learned from large electronic health records
title_short Time-sensitive clinical concept embeddings learned from large electronic health records
title_sort time sensitive clinical concept embeddings learned from large electronic health records
topic Clinical concept embedding
Distributional representation
Time sensitive concept embedding
Electronic medical records
Concept similarity
Predictive modeling
url http://link.springer.com/article/10.1186/s12911-019-0766-3
work_keys_str_mv AT yangxiang timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT junxu timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT yuqisi timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT zhihengli timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT lailarasmy timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT yujiazhou timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT firattiryaki timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT fangli timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT yaoyunzhang timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT yonghuiwu timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT xiaoqianjiang timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT wenjinjimzheng timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT deguizhi timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT cuitao timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords
AT huaxu timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords