Time-sensitive clinical concept embeddings learned from large electronic health records
Abstract Background Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a p...
Main Authors: | , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-04-01
|
Series: | BMC Medical Informatics and Decision Making |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12911-019-0766-3 |
_version_ | 1818291641913442304 |
---|---|
author | Yang Xiang Jun Xu Yuqi Si Zhiheng Li Laila Rasmy Yujia Zhou Firat Tiryaki Fang Li Yaoyun Zhang Yonghui Wu Xiaoqian Jiang Wenjin Jim Zheng Degui Zhi Cui Tao Hua Xu |
author_facet | Yang Xiang Jun Xu Yuqi Si Zhiheng Li Laila Rasmy Yujia Zhou Firat Tiryaki Fang Li Yaoyun Zhang Yonghui Wu Xiaoqian Jiang Wenjin Jim Zheng Degui Zhi Cui Tao Hua Xu |
author_sort | Yang Xiang |
collection | DOAJ |
description | Abstract Background Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient’s records, which may lead to incorrect selection of contexts. Methods To address this issue, we extended three popular concept embedding learning methods: word2vec, positive pointwise mutual information (PPMI) and FastText, to consider time-sensitive information. We then trained them on a large electronic health records (EHR) database containing about 50 million patients to generate concept embeddings and evaluated them for both intrinsic evaluations focusing on concept similarity measure and an extrinsic evaluation to assess the use of generated concept embeddings in the task of predicting disease onset. Results Our experiments show that embeddings learned from information within one visit (time window zero) improve performance on the concept similarity measure and the FastText algorithm usually had better performance than the other two algorithms. For the predictive modeling task, the optimal result was achieved by word2vec embeddings with a 30-day sliding window. Conclusions Considering time constraints are important in training clinical concept embeddings. We expect they can benefit a series of downstream applications. |
first_indexed | 2024-12-13T02:47:18Z |
format | Article |
id | doaj.art-f44e156eeab3417ab168dbe42a247a7d |
institution | Directory Open Access Journal |
issn | 1472-6947 |
language | English |
last_indexed | 2024-12-13T02:47:18Z |
publishDate | 2019-04-01 |
publisher | BMC |
record_format | Article |
series | BMC Medical Informatics and Decision Making |
spelling | doaj.art-f44e156eeab3417ab168dbe42a247a7d2022-12-22T00:02:10ZengBMCBMC Medical Informatics and Decision Making1472-69472019-04-0119S213914810.1186/s12911-019-0766-3Time-sensitive clinical concept embeddings learned from large electronic health recordsYang Xiang0Jun Xu1Yuqi Si2Zhiheng Li3Laila Rasmy4Yujia Zhou5Firat Tiryaki6Fang Li7Yaoyun Zhang8Yonghui Wu9Xiaoqian Jiang10Wenjin Jim Zheng11Degui Zhi12Cui Tao13Hua Xu14School of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonDepartment of Health Outcomes & Biomedical Informatics, College of Medicine, University of FloridaSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonSchool of Biomedical Informatics, The University of Texas Health Science Center at HoustonAbstract Background Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient’s records, which may lead to incorrect selection of contexts. Methods To address this issue, we extended three popular concept embedding learning methods: word2vec, positive pointwise mutual information (PPMI) and FastText, to consider time-sensitive information. We then trained them on a large electronic health records (EHR) database containing about 50 million patients to generate concept embeddings and evaluated them for both intrinsic evaluations focusing on concept similarity measure and an extrinsic evaluation to assess the use of generated concept embeddings in the task of predicting disease onset. Results Our experiments show that embeddings learned from information within one visit (time window zero) improve performance on the concept similarity measure and the FastText algorithm usually had better performance than the other two algorithms. For the predictive modeling task, the optimal result was achieved by word2vec embeddings with a 30-day sliding window. Conclusions Considering time constraints are important in training clinical concept embeddings. We expect they can benefit a series of downstream applications.http://link.springer.com/article/10.1186/s12911-019-0766-3Clinical concept embeddingDistributional representationTime sensitive concept embeddingElectronic medical recordsConcept similarityPredictive modeling |
spellingShingle | Yang Xiang Jun Xu Yuqi Si Zhiheng Li Laila Rasmy Yujia Zhou Firat Tiryaki Fang Li Yaoyun Zhang Yonghui Wu Xiaoqian Jiang Wenjin Jim Zheng Degui Zhi Cui Tao Hua Xu Time-sensitive clinical concept embeddings learned from large electronic health records BMC Medical Informatics and Decision Making Clinical concept embedding Distributional representation Time sensitive concept embedding Electronic medical records Concept similarity Predictive modeling |
title | Time-sensitive clinical concept embeddings learned from large electronic health records |
title_full | Time-sensitive clinical concept embeddings learned from large electronic health records |
title_fullStr | Time-sensitive clinical concept embeddings learned from large electronic health records |
title_full_unstemmed | Time-sensitive clinical concept embeddings learned from large electronic health records |
title_short | Time-sensitive clinical concept embeddings learned from large electronic health records |
title_sort | time sensitive clinical concept embeddings learned from large electronic health records |
topic | Clinical concept embedding Distributional representation Time sensitive concept embedding Electronic medical records Concept similarity Predictive modeling |
url | http://link.springer.com/article/10.1186/s12911-019-0766-3 |
work_keys_str_mv | AT yangxiang timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT junxu timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT yuqisi timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT zhihengli timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT lailarasmy timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT yujiazhou timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT firattiryaki timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT fangli timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT yaoyunzhang timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT yonghuiwu timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT xiaoqianjiang timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT wenjinjimzheng timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT deguizhi timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT cuitao timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords AT huaxu timesensitiveclinicalconceptembeddingslearnedfromlargeelectronichealthrecords |