Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review

BackgroundMultisite clinical studies are increasingly using real-world data to gain real-world evidence. However, due to the heterogeneity of source data, it is difficult to analyze such data in a unified way across clinics. Therefore, the implementation of Extract-Transform-...

Full description

Bibliographic Details
Main Authors: Yuan Peng, Franziska Bathelt, Richard Gebler, Robert Gött, Andreas Heidenreich, Elisa Henke, Dennis Kadioglu, Stephan Lorenz, Abishaa Vengadeswaran, Martin Sedlmayr
Format: Article
Language:English
Published: JMIR Publications 2024-02-01
Series:JMIR Medical Informatics
Online Access:https://medinform.jmir.org/2024/1/e52967
_version_ 1797308765846896640
author Yuan Peng
Franziska Bathelt
Richard Gebler
Robert Gött
Andreas Heidenreich
Elisa Henke
Dennis Kadioglu
Stephan Lorenz
Abishaa Vengadeswaran
Martin Sedlmayr
author_facet Yuan Peng
Franziska Bathelt
Richard Gebler
Robert Gött
Andreas Heidenreich
Elisa Henke
Dennis Kadioglu
Stephan Lorenz
Abishaa Vengadeswaran
Martin Sedlmayr
author_sort Yuan Peng
collection DOAJ
description BackgroundMultisite clinical studies are increasingly using real-world data to gain real-world evidence. However, due to the heterogeneity of source data, it is difficult to analyze such data in a unified way across clinics. Therefore, the implementation of Extract-Transform-Load (ETL) or Extract-Load-Transform (ELT) processes for harmonizing local health data is necessary, in order to guarantee the data quality for research. However, the development of such processes is time-consuming and unsustainable. A promising way to ease this is the generalization of ETL/ELT processes. ObjectiveIn this work, we investigate existing possibilities for the development of generic ETL/ELT processes. Particularly, we focus on approaches with low development complexity by using descriptive metadata and structural metadata. MethodsWe conducted a literature review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We used 4 publication databases (ie, PubMed, IEEE Explore, Web of Science, and Biomed Center) to search for relevant publications from 2012 to 2022. The PRISMA flow was then visualized using an R-based tool (Evidence Synthesis Hackathon). All relevant contents of the publications were extracted into a spreadsheet for further analysis and visualization. ResultsRegarding the PRISMA guidelines, we included 33 publications in this literature review. All included publications were categorized into 7 different focus groups (ie, medicine, data warehouse, big data, industry, geoinformatics, archaeology, and military). Based on the extracted data, ontology-based and rule-based approaches were the 2 most used approaches in different thematic categories. Different approaches and tools were chosen to achieve different purposes within the use cases. ConclusionsOur literature review shows that using metadata-driven (MDD) approaches to develop an ETL/ELT process can serve different purposes in different thematic categories. The results show that it is promising to implement an ETL/ELT process by applying MDD approach to automate the data transformation from Fast Healthcare Interoperability Resources to Observational Medical Outcomes Partnership Common Data Model. However, the determining of an appropriate MDD approach and tool to implement such an ETL/ELT process remains a challenge. This is due to the lack of comprehensive insight into the characterizations of the MDD approaches presented in this study. Therefore, our next step is to evaluate the MDD approaches presented in this study and to determine the most appropriate MDD approaches and the way to integrate them into the ETL/ELT process. This could verify the ability of using MDD approaches to generalize the ETL process for harmonizing medical data.
first_indexed 2024-03-08T01:16:27Z
format Article
id doaj.art-a9c70af8fac2436ca59d3bb1f167eaca
institution Directory Open Access Journal
issn 2291-9694
language English
last_indexed 2024-03-08T01:16:27Z
publishDate 2024-02-01
publisher JMIR Publications
record_format Article
series JMIR Medical Informatics
spelling doaj.art-a9c70af8fac2436ca59d3bb1f167eaca2024-02-14T14:45:32ZengJMIR PublicationsJMIR Medical Informatics2291-96942024-02-0112e5296710.2196/52967Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping ReviewYuan Penghttps://orcid.org/0000-0002-6163-9532Franziska Bathelthttps://orcid.org/0000-0002-4139-5489Richard Geblerhttps://orcid.org/0009-0004-1543-9769Robert Götthttps://orcid.org/0000-0001-9985-8311Andreas Heidenreichhttps://orcid.org/0000-0002-2650-4464Elisa Henkehttps://orcid.org/0000-0002-5002-2676Dennis Kadiogluhttps://orcid.org/0000-0002-1561-4924Stephan Lorenzhttps://orcid.org/0000-0002-9152-1826Abishaa Vengadeswaranhttps://orcid.org/0009-0005-1077-1698Martin Sedlmayrhttps://orcid.org/0000-0002-9888-8460 BackgroundMultisite clinical studies are increasingly using real-world data to gain real-world evidence. However, due to the heterogeneity of source data, it is difficult to analyze such data in a unified way across clinics. Therefore, the implementation of Extract-Transform-Load (ETL) or Extract-Load-Transform (ELT) processes for harmonizing local health data is necessary, in order to guarantee the data quality for research. However, the development of such processes is time-consuming and unsustainable. A promising way to ease this is the generalization of ETL/ELT processes. ObjectiveIn this work, we investigate existing possibilities for the development of generic ETL/ELT processes. Particularly, we focus on approaches with low development complexity by using descriptive metadata and structural metadata. MethodsWe conducted a literature review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We used 4 publication databases (ie, PubMed, IEEE Explore, Web of Science, and Biomed Center) to search for relevant publications from 2012 to 2022. The PRISMA flow was then visualized using an R-based tool (Evidence Synthesis Hackathon). All relevant contents of the publications were extracted into a spreadsheet for further analysis and visualization. ResultsRegarding the PRISMA guidelines, we included 33 publications in this literature review. All included publications were categorized into 7 different focus groups (ie, medicine, data warehouse, big data, industry, geoinformatics, archaeology, and military). Based on the extracted data, ontology-based and rule-based approaches were the 2 most used approaches in different thematic categories. Different approaches and tools were chosen to achieve different purposes within the use cases. ConclusionsOur literature review shows that using metadata-driven (MDD) approaches to develop an ETL/ELT process can serve different purposes in different thematic categories. The results show that it is promising to implement an ETL/ELT process by applying MDD approach to automate the data transformation from Fast Healthcare Interoperability Resources to Observational Medical Outcomes Partnership Common Data Model. However, the determining of an appropriate MDD approach and tool to implement such an ETL/ELT process remains a challenge. This is due to the lack of comprehensive insight into the characterizations of the MDD approaches presented in this study. Therefore, our next step is to evaluate the MDD approaches presented in this study and to determine the most appropriate MDD approaches and the way to integrate them into the ETL/ELT process. This could verify the ability of using MDD approaches to generalize the ETL process for harmonizing medical data.https://medinform.jmir.org/2024/1/e52967
spellingShingle Yuan Peng
Franziska Bathelt
Richard Gebler
Robert Gött
Andreas Heidenreich
Elisa Henke
Dennis Kadioglu
Stephan Lorenz
Abishaa Vengadeswaran
Martin Sedlmayr
Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review
JMIR Medical Informatics
title Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review
title_full Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review
title_fullStr Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review
title_full_unstemmed Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review
title_short Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review
title_sort use of metadata driven approaches for data harmonization in the medical domain scoping review
url https://medinform.jmir.org/2024/1/e52967
work_keys_str_mv AT yuanpeng useofmetadatadrivenapproachesfordataharmonizationinthemedicaldomainscopingreview
AT franziskabathelt useofmetadatadrivenapproachesfordataharmonizationinthemedicaldomainscopingreview
AT richardgebler useofmetadatadrivenapproachesfordataharmonizationinthemedicaldomainscopingreview
AT robertgott useofmetadatadrivenapproachesfordataharmonizationinthemedicaldomainscopingreview
AT andreasheidenreich useofmetadatadrivenapproachesfordataharmonizationinthemedicaldomainscopingreview
AT elisahenke useofmetadatadrivenapproachesfordataharmonizationinthemedicaldomainscopingreview
AT denniskadioglu useofmetadatadrivenapproachesfordataharmonizationinthemedicaldomainscopingreview
AT stephanlorenz useofmetadatadrivenapproachesfordataharmonizationinthemedicaldomainscopingreview
AT abishaavengadeswaran useofmetadatadrivenapproachesfordataharmonizationinthemedicaldomainscopingreview
AT martinsedlmayr useofmetadatadrivenapproachesfordataharmonizationinthemedicaldomainscopingreview