Quality assessment of real-world data repositories across the data life cycle: a literature review

<p><strong>Objective:&nbsp;</strong>Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sou...

Full description

Bibliographic Details
Main Authors: Liaw, S-T, Guo, JGN, Ansari, S, Jonnagaddala, J, Godinho, MA, Borelli, AJ, de Lusignan, S, Capurro, D, Liyanage, H, Bhattal, N, Bennett, V, Chan, J, Kahn, MG
Format: Journal article
Language:English
Published: Oxford University Press 2021
_version_ 1797110770491719680
author Liaw, S-T
Guo, JGN
Ansari, S
Jonnagaddala, J
Godinho, MA
Borelli, AJ
de Lusignan, S
Capurro, D
Liyanage, H
Bhattal, N
Bennett, V
Chan, J
Kahn, MG
author_facet Liaw, S-T
Guo, JGN
Ansari, S
Jonnagaddala, J
Godinho, MA
Borelli, AJ
de Lusignan, S
Capurro, D
Liyanage, H
Bhattal, N
Bennett, V
Chan, J
Kahn, MG
author_sort Liaw, S-T
collection OXFORD
description <p><strong>Objective:&nbsp;</strong>Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle.</p> <p><strong>Materials and Methods:&nbsp;</strong>The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached.</p> <p><strong>Results:&nbsp;</strong>The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found.</p> <p><strong>Conclusions:&nbsp;</strong>A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.</p>
first_indexed 2024-03-07T07:59:31Z
format Journal article
id oxford-uuid:c7a62cf1-7945-42be-8399-4f9b3324ce36
institution University of Oxford
language English
last_indexed 2024-03-07T07:59:31Z
publishDate 2021
publisher Oxford University Press
record_format dspace
spelling oxford-uuid:c7a62cf1-7945-42be-8399-4f9b3324ce362023-09-12T08:16:18ZQuality assessment of real-world data repositories across the data life cycle: a literature reviewJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:c7a62cf1-7945-42be-8399-4f9b3324ce36EnglishSymplectic ElementsOxford University Press2021Liaw, S-TGuo, JGNAnsari, SJonnagaddala, JGodinho, MABorelli, AJde Lusignan, SCapurro, DLiyanage, HBhattal, NBennett, VChan, JKahn, MG<p><strong>Objective:&nbsp;</strong>Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle.</p> <p><strong>Materials and Methods:&nbsp;</strong>The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached.</p> <p><strong>Results:&nbsp;</strong>The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found.</p> <p><strong>Conclusions:&nbsp;</strong>A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.</p>
spellingShingle Liaw, S-T
Guo, JGN
Ansari, S
Jonnagaddala, J
Godinho, MA
Borelli, AJ
de Lusignan, S
Capurro, D
Liyanage, H
Bhattal, N
Bennett, V
Chan, J
Kahn, MG
Quality assessment of real-world data repositories across the data life cycle: a literature review
title Quality assessment of real-world data repositories across the data life cycle: a literature review
title_full Quality assessment of real-world data repositories across the data life cycle: a literature review
title_fullStr Quality assessment of real-world data repositories across the data life cycle: a literature review
title_full_unstemmed Quality assessment of real-world data repositories across the data life cycle: a literature review
title_short Quality assessment of real-world data repositories across the data life cycle: a literature review
title_sort quality assessment of real world data repositories across the data life cycle a literature review
work_keys_str_mv AT liawst qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT guojgn qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT ansaris qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT jonnagaddalaj qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT godinhoma qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT borelliaj qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT delusignans qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT capurrod qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT liyanageh qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT bhattaln qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT bennettv qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT chanj qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview
AT kahnmg qualityassessmentofrealworlddatarepositoriesacrossthedatalifecyclealiteraturereview