Mapping of Alzheimer’s disease related data elements and the NIH Common Data Elements

Abstract Background Alzheimer’s Disease (AD) is a devastating disease that destroys memory and other cognitive functions. There has been an increasing research effort to prevent and treat AD. In the US, two major data sharing resources for AD research are the National Alzheimer’s Coordinating Center...

Full description

Bibliographic Details
Main Authors: Xubing Hao, Rashmie Abeysinghe, Fengbo Zheng, Paul E. Schulz, The Alzheimer’s Disease Neuroimaging Initiative, Licong Cui
Format: Article
Language:English
Published: BMC 2024-04-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-024-02500-8
_version_ 1827277088035438592
author Xubing Hao
Rashmie Abeysinghe
Fengbo Zheng
Paul E. Schulz
The Alzheimer’s Disease Neuroimaging Initiative
Licong Cui
author_facet Xubing Hao
Rashmie Abeysinghe
Fengbo Zheng
Paul E. Schulz
The Alzheimer’s Disease Neuroimaging Initiative
Licong Cui
author_sort Xubing Hao
collection DOAJ
description Abstract Background Alzheimer’s Disease (AD) is a devastating disease that destroys memory and other cognitive functions. There has been an increasing research effort to prevent and treat AD. In the US, two major data sharing resources for AD research are the National Alzheimer’s Coordinating Center (NACC) and the Alzheimer’s Disease Neuroimaging Initiative (ADNI); Additionally, the National Institutes of Health (NIH) Common Data Elements (CDE) Repository has been developed to facilitate data sharing and improve the interoperability among data sets in various disease research areas. Method To better understand how AD-related data elements in these resources are interoperable with each other, we leverage different representation models to map data elements from different resources: NACC to ADNI, NACC to NIH CDE, and ADNI to NIH CDE. We explore bag-of-words based and word embeddings based models (Word2Vec and BioWordVec) to perform the data element mappings in these resources. Results The data dictionaries downloaded on November 23, 2021 contain 1,195 data elements in NACC, 13,918 in ADNI, and 27,213 in NIH CDE Repository. Data element preprocessing reduced the numbers of NACC and ADNI data elements for mapping to 1,099 and 7,584 respectively. Manual evaluation of the mapping results showed that the bag-of-words based approach achieved the best precision, while the BioWordVec based approach attained the best recall. In total, the three approaches mapped 175 out of 1,099 (15.92%) NACC data elements to ADNI; 107 out of 1,099 (9.74%) NACC data elements to NIH CDE; and 171 out of 7,584 (2.25%) ADNI data elements to NIH CDE. Conclusions The bag-of-words based and word embeddings based approaches showed promise in mapping AD-related data elements between different resources. Although the mapping approaches need further improvement, our result indicates that there is a critical need to standardize CDEs across these valuable AD research resources in order to maximize the discoveries regarding AD pathophysiology, diagnosis, and treatment that can be gleaned from them.
first_indexed 2024-04-24T07:15:01Z
format Article
id doaj.art-3e93843e387e4f0db28115495e60a313
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-04-24T07:15:01Z
publishDate 2024-04-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-3e93843e387e4f0db28115495e60a3132024-04-21T11:21:06ZengBMCBMC Medical Informatics and Decision Making1472-69472024-04-0124S311210.1186/s12911-024-02500-8Mapping of Alzheimer’s disease related data elements and the NIH Common Data ElementsXubing Hao0Rashmie Abeysinghe1Fengbo Zheng2Paul E. Schulz3The Alzheimer’s Disease Neuroimaging InitiativeLicong Cui4McWilliams School of Biomedical Informatics, University of Texas Health Science Center at HoustonDepartment of Neurology, McGovern School of Medicine, University of Texas Health Science Center at HoustonMcWilliams School of Biomedical Informatics, University of Texas Health Science Center at HoustonDepartment of Neurology, McGovern School of Medicine, University of Texas Health Science Center at HoustonMcWilliams School of Biomedical Informatics, University of Texas Health Science Center at HoustonAbstract Background Alzheimer’s Disease (AD) is a devastating disease that destroys memory and other cognitive functions. There has been an increasing research effort to prevent and treat AD. In the US, two major data sharing resources for AD research are the National Alzheimer’s Coordinating Center (NACC) and the Alzheimer’s Disease Neuroimaging Initiative (ADNI); Additionally, the National Institutes of Health (NIH) Common Data Elements (CDE) Repository has been developed to facilitate data sharing and improve the interoperability among data sets in various disease research areas. Method To better understand how AD-related data elements in these resources are interoperable with each other, we leverage different representation models to map data elements from different resources: NACC to ADNI, NACC to NIH CDE, and ADNI to NIH CDE. We explore bag-of-words based and word embeddings based models (Word2Vec and BioWordVec) to perform the data element mappings in these resources. Results The data dictionaries downloaded on November 23, 2021 contain 1,195 data elements in NACC, 13,918 in ADNI, and 27,213 in NIH CDE Repository. Data element preprocessing reduced the numbers of NACC and ADNI data elements for mapping to 1,099 and 7,584 respectively. Manual evaluation of the mapping results showed that the bag-of-words based approach achieved the best precision, while the BioWordVec based approach attained the best recall. In total, the three approaches mapped 175 out of 1,099 (15.92%) NACC data elements to ADNI; 107 out of 1,099 (9.74%) NACC data elements to NIH CDE; and 171 out of 7,584 (2.25%) ADNI data elements to NIH CDE. Conclusions The bag-of-words based and word embeddings based approaches showed promise in mapping AD-related data elements between different resources. Although the mapping approaches need further improvement, our result indicates that there is a critical need to standardize CDEs across these valuable AD research resources in order to maximize the discoveries regarding AD pathophysiology, diagnosis, and treatment that can be gleaned from them.https://doi.org/10.1186/s12911-024-02500-8Alzheimer’s diseaseData element mappingSemantic interoperability
spellingShingle Xubing Hao
Rashmie Abeysinghe
Fengbo Zheng
Paul E. Schulz
The Alzheimer’s Disease Neuroimaging Initiative
Licong Cui
Mapping of Alzheimer’s disease related data elements and the NIH Common Data Elements
BMC Medical Informatics and Decision Making
Alzheimer’s disease
Data element mapping
Semantic interoperability
title Mapping of Alzheimer’s disease related data elements and the NIH Common Data Elements
title_full Mapping of Alzheimer’s disease related data elements and the NIH Common Data Elements
title_fullStr Mapping of Alzheimer’s disease related data elements and the NIH Common Data Elements
title_full_unstemmed Mapping of Alzheimer’s disease related data elements and the NIH Common Data Elements
title_short Mapping of Alzheimer’s disease related data elements and the NIH Common Data Elements
title_sort mapping of alzheimer s disease related data elements and the nih common data elements
topic Alzheimer’s disease
Data element mapping
Semantic interoperability
url https://doi.org/10.1186/s12911-024-02500-8
work_keys_str_mv AT xubinghao mappingofalzheimersdiseaserelateddataelementsandthenihcommondataelements
AT rashmieabeysinghe mappingofalzheimersdiseaserelateddataelementsandthenihcommondataelements
AT fengbozheng mappingofalzheimersdiseaserelateddataelementsandthenihcommondataelements
AT pauleschulz mappingofalzheimersdiseaserelateddataelementsandthenihcommondataelements
AT thealzheimersdiseaseneuroimaginginitiative mappingofalzheimersdiseaserelateddataelementsandthenihcommondataelements
AT licongcui mappingofalzheimersdiseaserelateddataelementsandthenihcommondataelements