Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis.

Objectives Administrative data are primarily collected for operational processes and these processes can lead to sources of bias that may not be adequately considered by researchers. We provide a framework to help understand how biases might arise from using linked administrative data, and hopefull...

Full description

Bibliographic Details
Main Authors: Richard Shaw, Katie Harron, Julia Pescarini, Elzo Júnior, Andressa Siroky, Desmond Campbell, Ruth Dundas, Maria Yury Ichihara, Mauricio Barreto, Vittal Katikireddi
Format: Article
Language:English
Published: Swansea University 2022-08-01
Series:International Journal of Population Data Science
Subjects:
Online Access:https://ijpds.org/article/view/1800
_version_ 1797422953806168064
author Richard Shaw
Katie Harron
Julia Pescarini
Elzo Júnior
Andressa Siroky
Desmond Campbell
Ruth Dundas
Maria Yury Ichihara
Mauricio Barreto
Vittal Katikireddi
author_facet Richard Shaw
Katie Harron
Julia Pescarini
Elzo Júnior
Andressa Siroky
Desmond Campbell
Ruth Dundas
Maria Yury Ichihara
Mauricio Barreto
Vittal Katikireddi
author_sort Richard Shaw
collection DOAJ
description Objectives Administrative data are primarily collected for operational processes and these processes can lead to sources of bias that may not be adequately considered by researchers. We provide a framework to help understand how biases might arise from using linked administrative data, and hopefully aid future study designs. Approach We developed the conceptual framework based on the team’s experiences with the 100 Million Brazilian Cohort (100MCohort) which contains records of more than 131 million people whose families applied for social assistance between 2001 and 2018, linked to other administrative data sources. We provide examples from the 100MCohort of where and how in the linkage process different forms of bias could arise. We make recommendations on how biases might be addressed using commonly available external data. Results The conceptual framework covers the whole data generating process from people and events occurring in the population through to deriving variables for analysis. The framework comprises three distinct stages: 1) Recording and registration of events in administrative systems such as Brazil’s Mortality Information System (SIM) and the Hospital Information System (SIH); 2) Linkage of different data sources, for example using exact matching via the Social Identification Number (NIS) in Brazil’s CadÚnico database or linkage algorithms; 3) Cleaning and coding data used both for analysis and linkage. The biases arising from linkage can be better understood by applying theory and making additional metadata available. Conclusion Maximising the potential of administrative data for research requires a better understanding of how biases arise. This is best achieved by considering the entire data generating process, and better communication among all those involved in the data collection and linkage processes.
first_indexed 2024-03-09T07:39:33Z
format Article
id doaj.art-a264be30366e475e872f188421b33d36
institution Directory Open Access Journal
issn 2399-4908
language English
last_indexed 2024-03-09T07:39:33Z
publishDate 2022-08-01
publisher Swansea University
record_format Article
series International Journal of Population Data Science
spelling doaj.art-a264be30366e475e872f188421b33d362023-12-03T05:02:33ZengSwansea UniversityInternational Journal of Population Data Science2399-49082022-08-017310.23889/ijpds.v7i3.1800Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis.Richard Shaw0Katie Harron1Julia Pescarini2Elzo Júnior3Andressa Siroky4Desmond Campbell5Ruth Dundas6Maria Yury Ichihara 7Mauricio Barreto8Vittal Katikireddi9University of GlasgowUCL Great Ormond Street Institute of Child HealthLondon School of Hygiene and Tropical MedicineCentre for Data and Knowledge Integration for Health (CIDACS)Centre for Data and Knowledge Integration for Health (CIDACS)University of GlasgowUniversity of GlasgowCentre for Data and Knowledge Integration for Health (CIDACS)Centre for Data and Knowledge Integration for Health (CIDACS)University of Glasgow Objectives Administrative data are primarily collected for operational processes and these processes can lead to sources of bias that may not be adequately considered by researchers. We provide a framework to help understand how biases might arise from using linked administrative data, and hopefully aid future study designs. Approach We developed the conceptual framework based on the team’s experiences with the 100 Million Brazilian Cohort (100MCohort) which contains records of more than 131 million people whose families applied for social assistance between 2001 and 2018, linked to other administrative data sources. We provide examples from the 100MCohort of where and how in the linkage process different forms of bias could arise. We make recommendations on how biases might be addressed using commonly available external data. Results The conceptual framework covers the whole data generating process from people and events occurring in the population through to deriving variables for analysis. The framework comprises three distinct stages: 1) Recording and registration of events in administrative systems such as Brazil’s Mortality Information System (SIM) and the Hospital Information System (SIH); 2) Linkage of different data sources, for example using exact matching via the Social Identification Number (NIS) in Brazil’s CadÚnico database or linkage algorithms; 3) Cleaning and coding data used both for analysis and linkage. The biases arising from linkage can be better understood by applying theory and making additional metadata available. Conclusion Maximising the potential of administrative data for research requires a better understanding of how biases arise. This is best achieved by considering the entire data generating process, and better communication among all those involved in the data collection and linkage processes. https://ijpds.org/article/view/1800Epidemiological biasesLinkage errorData linkageRecord linkageConceptual framework
spellingShingle Richard Shaw
Katie Harron
Julia Pescarini
Elzo Júnior
Andressa Siroky
Desmond Campbell
Ruth Dundas
Maria Yury Ichihara
Mauricio Barreto
Vittal Katikireddi
Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis.
International Journal of Population Data Science
Epidemiological biases
Linkage error
Data linkage
Record linkage
Conceptual framework
title Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis.
title_full Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis.
title_fullStr Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis.
title_full_unstemmed Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis.
title_short Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis.
title_sort biases arising from using linked administrative data for research a conceptual framework from registration to analysis
topic Epidemiological biases
Linkage error
Data linkage
Record linkage
Conceptual framework
url https://ijpds.org/article/view/1800
work_keys_str_mv AT richardshaw biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis
AT katieharron biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis
AT juliapescarini biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis
AT elzojunior biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis
AT andressasiroky biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis
AT desmondcampbell biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis
AT ruthdundas biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis
AT mariayuryichihara biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis
AT mauriciobarreto biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis
AT vittalkatikireddi biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis