Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner
Abstract Background The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Grou...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-12-01
|
Series: | BMC Medical Informatics and Decision Making |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12911-019-0979-5 |
_version_ | 1819295554933358592 |
---|---|
author | Robinette Renner Shengyu Li Yulong Huang Ada Chaeli van der Zijp-Tan Shaobo Tan Dongqi Li Mohan Vamsi Kasukurthi Ryan Benton Glen M. Borchert Jingshan Huang Guoqian Jiang |
author_facet | Robinette Renner Shengyu Li Yulong Huang Ada Chaeli van der Zijp-Tan Shaobo Tan Dongqi Li Mohan Vamsi Kasukurthi Ryan Benton Glen M. Borchert Jingshan Huang Guoqian Jiang |
author_sort | Robinette Renner |
collection | DOAJ |
description | Abstract Background The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. Methods In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. Results For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. Discussion Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. Conclusions Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality. |
first_indexed | 2024-12-24T04:44:04Z |
format | Article |
id | doaj.art-b0cae20075254985b749fafbf8d5997a |
institution | Directory Open Access Journal |
issn | 1472-6947 |
language | English |
last_indexed | 2024-12-24T04:44:04Z |
publishDate | 2019-12-01 |
publisher | BMC |
record_format | Article |
series | BMC Medical Informatics and Decision Making |
spelling | doaj.art-b0cae20075254985b749fafbf8d5997a2022-12-21T17:14:44ZengBMCBMC Medical Informatics and Decision Making1472-69472019-12-0119S711310.1186/s12911-019-0979-5Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated mannerRobinette Renner0Shengyu Li1Yulong Huang2Ada Chaeli van der Zijp-Tan3Shaobo Tan4Dongqi Li5Mohan Vamsi Kasukurthi6Ryan Benton7Glen M. Borchert8Jingshan Huang9Guoqian Jiang10University of MinnesotaSchool of Computing, University of South AlabamaCollege of Allied Health Professions, University of South AlabamaCollege of Allied Health Professions, University of South AlabamaSchool of Computing, University of South AlabamaSchool of Computing, University of South AlabamaSchool of Computing, University of South AlabamaSchool of Computing, University of South AlabamaCollege of Medicine, University of South AlabamaSchool of Computing, University of South AlabamaMayo ClinicAbstract Background The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. Methods In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. Results For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. Discussion Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. Conclusions Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.https://doi.org/10.1186/s12911-019-0979-5Common data elementArtificial neural networkSchema mappingBiomedical research integrated domain group (BRIDG) model |
spellingShingle | Robinette Renner Shengyu Li Yulong Huang Ada Chaeli van der Zijp-Tan Shaobo Tan Dongqi Li Mohan Vamsi Kasukurthi Ryan Benton Glen M. Borchert Jingshan Huang Guoqian Jiang Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner BMC Medical Informatics and Decision Making Common data element Artificial neural network Schema mapping Biomedical research integrated domain group (BRIDG) model |
title | Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner |
title_full | Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner |
title_fullStr | Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner |
title_full_unstemmed | Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner |
title_short | Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner |
title_sort | using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi automated manner |
topic | Common data element Artificial neural network Schema mapping Biomedical research integrated domain group (BRIDG) model |
url | https://doi.org/10.1186/s12911-019-0979-5 |
work_keys_str_mv | AT robinetterenner usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner AT shengyuli usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner AT yulonghuang usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner AT adachaelivanderzijptan usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner AT shaobotan usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner AT dongqili usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner AT mohanvamsikasukurthi usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner AT ryanbenton usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner AT glenmborchert usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner AT jingshanhuang usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner AT guoqianjiang usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner |