Discovering disease–disease associations using electronic health records in The Guideline Advantage (TGA) dataset

Abstract Certain diseases have strong comorbidity and co-occurrence with others. Understanding disease–disease associations can potentially increase awareness among healthcare providers of co-occurring conditions and facilitate earlier diagnosis, prevention and treatment of patients. In this study,...

Full description

Bibliographic Details
Main Authors: Aixia Guo, Yosef M. Khan, James R. Langabeer, Randi E. Foraker
Format: Article
Language:English
Published: Nature Portfolio 2021-10-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-021-00345-z
_version_ 1818688897395195904
author Aixia Guo
Yosef M. Khan
James R. Langabeer
Randi E. Foraker
author_facet Aixia Guo
Yosef M. Khan
James R. Langabeer
Randi E. Foraker
author_sort Aixia Guo
collection DOAJ
description Abstract Certain diseases have strong comorbidity and co-occurrence with others. Understanding disease–disease associations can potentially increase awareness among healthcare providers of co-occurring conditions and facilitate earlier diagnosis, prevention and treatment of patients. In this study, we utilized the valuable and large The Guideline Advantage (TGA) longitudinal electronic health record dataset from 70 outpatient clinics across the United States to investigate potential disease–disease associations. Specifically, the most prevalent 50 disease diagnoses were manually identified from 165,732 unique patients. To investigate the co-occurrence or dependency associations among the 50 diseases, the categorical disease terms were first mapped into numerical vectors based on disease co-occurrence frequency in individual patients using the Word2Vec approach. Then the novel and interesting disease association clusters were identified using correlation and clustering analyses in the numerical space. Moreover, the distribution of time delay (Δt) between pair-wise strongly associated diseases (correlation coefficients ≥ 0.5) were calculated to show the dependency among the diseases. The results can indicate the risk of disease comorbidity and complications, and facilitate disease prevention and optimal treatment decision-making.
first_indexed 2024-12-17T12:01:30Z
format Article
id doaj.art-01d193938ce24f79bf4641b741d2fe72
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-12-17T12:01:30Z
publishDate 2021-10-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-01d193938ce24f79bf4641b741d2fe722022-12-21T21:49:49ZengNature PortfolioScientific Reports2045-23222021-10-0111111010.1038/s41598-021-00345-zDiscovering disease–disease associations using electronic health records in The Guideline Advantage (TGA) datasetAixia Guo0Yosef M. Khan1James R. Langabeer2Randi E. Foraker3Institute for Informatics (I2), Washington University School of MedicineHealth Informatics and Analytics, Centers for Health Metrics and Evaluation, American Heart AssociationSchool of Biomedical Informatics, Health Science Center at Houston, The University of TexasInstitute for Informatics (I2), Washington University School of MedicineAbstract Certain diseases have strong comorbidity and co-occurrence with others. Understanding disease–disease associations can potentially increase awareness among healthcare providers of co-occurring conditions and facilitate earlier diagnosis, prevention and treatment of patients. In this study, we utilized the valuable and large The Guideline Advantage (TGA) longitudinal electronic health record dataset from 70 outpatient clinics across the United States to investigate potential disease–disease associations. Specifically, the most prevalent 50 disease diagnoses were manually identified from 165,732 unique patients. To investigate the co-occurrence or dependency associations among the 50 diseases, the categorical disease terms were first mapped into numerical vectors based on disease co-occurrence frequency in individual patients using the Word2Vec approach. Then the novel and interesting disease association clusters were identified using correlation and clustering analyses in the numerical space. Moreover, the distribution of time delay (Δt) between pair-wise strongly associated diseases (correlation coefficients ≥ 0.5) were calculated to show the dependency among the diseases. The results can indicate the risk of disease comorbidity and complications, and facilitate disease prevention and optimal treatment decision-making.https://doi.org/10.1038/s41598-021-00345-z
spellingShingle Aixia Guo
Yosef M. Khan
James R. Langabeer
Randi E. Foraker
Discovering disease–disease associations using electronic health records in The Guideline Advantage (TGA) dataset
Scientific Reports
title Discovering disease–disease associations using electronic health records in The Guideline Advantage (TGA) dataset
title_full Discovering disease–disease associations using electronic health records in The Guideline Advantage (TGA) dataset
title_fullStr Discovering disease–disease associations using electronic health records in The Guideline Advantage (TGA) dataset
title_full_unstemmed Discovering disease–disease associations using electronic health records in The Guideline Advantage (TGA) dataset
title_short Discovering disease–disease associations using electronic health records in The Guideline Advantage (TGA) dataset
title_sort discovering disease disease associations using electronic health records in the guideline advantage tga dataset
url https://doi.org/10.1038/s41598-021-00345-z
work_keys_str_mv AT aixiaguo discoveringdiseasediseaseassociationsusingelectronichealthrecordsintheguidelineadvantagetgadataset
AT yosefmkhan discoveringdiseasediseaseassociationsusingelectronichealthrecordsintheguidelineadvantagetgadataset
AT jamesrlangabeer discoveringdiseasediseaseassociationsusingelectronichealthrecordsintheguidelineadvantagetgadataset
AT randieforaker discoveringdiseasediseaseassociationsusingelectronichealthrecordsintheguidelineadvantagetgadataset