Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis.
Objectives Administrative data are primarily collected for operational processes and these processes can lead to sources of bias that may not be adequately considered by researchers. We provide a framework to help understand how biases might arise from using linked administrative data, and hopefull...
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2022-08-01
|
Series: | International Journal of Population Data Science |
Subjects: | |
Online Access: | https://ijpds.org/article/view/1800 |
_version_ | 1797422953806168064 |
---|---|
author | Richard Shaw Katie Harron Julia Pescarini Elzo Júnior Andressa Siroky Desmond Campbell Ruth Dundas Maria Yury Ichihara Mauricio Barreto Vittal Katikireddi |
author_facet | Richard Shaw Katie Harron Julia Pescarini Elzo Júnior Andressa Siroky Desmond Campbell Ruth Dundas Maria Yury Ichihara Mauricio Barreto Vittal Katikireddi |
author_sort | Richard Shaw |
collection | DOAJ |
description |
Objectives
Administrative data are primarily collected for operational processes and these processes can lead to sources of bias that may not be adequately considered by researchers. We provide a framework to help understand how biases might arise from using linked administrative data, and hopefully aid future study designs.
Approach
We developed the conceptual framework based on the team’s experiences with the 100 Million Brazilian Cohort (100MCohort) which contains records of more than 131 million people whose families applied for social assistance between 2001 and 2018, linked to other administrative data sources. We provide examples from the 100MCohort of where and how in the linkage process different forms of bias could arise. We make recommendations on how biases might be addressed using commonly available external data.
Results
The conceptual framework covers the whole data generating process from people and events occurring in the population through to deriving variables for analysis. The framework comprises three distinct stages: 1) Recording and registration of events in administrative systems such as Brazil’s Mortality Information System (SIM) and the Hospital Information System (SIH); 2) Linkage of different data sources, for example using exact matching via the Social Identification Number (NIS) in Brazil’s CadÚnico database or linkage algorithms; 3) Cleaning and coding data used both for analysis and linkage. The biases arising from linkage can be better understood by applying theory and making additional metadata available.
Conclusion
Maximising the potential of administrative data for research requires a better understanding of how biases arise. This is best achieved by considering the entire data generating process, and better communication among all those involved in the data collection and linkage processes.
|
first_indexed | 2024-03-09T07:39:33Z |
format | Article |
id | doaj.art-a264be30366e475e872f188421b33d36 |
institution | Directory Open Access Journal |
issn | 2399-4908 |
language | English |
last_indexed | 2024-03-09T07:39:33Z |
publishDate | 2022-08-01 |
publisher | Swansea University |
record_format | Article |
series | International Journal of Population Data Science |
spelling | doaj.art-a264be30366e475e872f188421b33d362023-12-03T05:02:33ZengSwansea UniversityInternational Journal of Population Data Science2399-49082022-08-017310.23889/ijpds.v7i3.1800Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis.Richard Shaw0Katie Harron1Julia Pescarini2Elzo Júnior3Andressa Siroky4Desmond Campbell5Ruth Dundas6Maria Yury Ichihara 7Mauricio Barreto8Vittal Katikireddi9University of GlasgowUCL Great Ormond Street Institute of Child HealthLondon School of Hygiene and Tropical MedicineCentre for Data and Knowledge Integration for Health (CIDACS)Centre for Data and Knowledge Integration for Health (CIDACS)University of GlasgowUniversity of GlasgowCentre for Data and Knowledge Integration for Health (CIDACS)Centre for Data and Knowledge Integration for Health (CIDACS)University of Glasgow Objectives Administrative data are primarily collected for operational processes and these processes can lead to sources of bias that may not be adequately considered by researchers. We provide a framework to help understand how biases might arise from using linked administrative data, and hopefully aid future study designs. Approach We developed the conceptual framework based on the team’s experiences with the 100 Million Brazilian Cohort (100MCohort) which contains records of more than 131 million people whose families applied for social assistance between 2001 and 2018, linked to other administrative data sources. We provide examples from the 100MCohort of where and how in the linkage process different forms of bias could arise. We make recommendations on how biases might be addressed using commonly available external data. Results The conceptual framework covers the whole data generating process from people and events occurring in the population through to deriving variables for analysis. The framework comprises three distinct stages: 1) Recording and registration of events in administrative systems such as Brazil’s Mortality Information System (SIM) and the Hospital Information System (SIH); 2) Linkage of different data sources, for example using exact matching via the Social Identification Number (NIS) in Brazil’s CadÚnico database or linkage algorithms; 3) Cleaning and coding data used both for analysis and linkage. The biases arising from linkage can be better understood by applying theory and making additional metadata available. Conclusion Maximising the potential of administrative data for research requires a better understanding of how biases arise. This is best achieved by considering the entire data generating process, and better communication among all those involved in the data collection and linkage processes. https://ijpds.org/article/view/1800Epidemiological biasesLinkage errorData linkageRecord linkageConceptual framework |
spellingShingle | Richard Shaw Katie Harron Julia Pescarini Elzo Júnior Andressa Siroky Desmond Campbell Ruth Dundas Maria Yury Ichihara Mauricio Barreto Vittal Katikireddi Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis. International Journal of Population Data Science Epidemiological biases Linkage error Data linkage Record linkage Conceptual framework |
title | Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis. |
title_full | Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis. |
title_fullStr | Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis. |
title_full_unstemmed | Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis. |
title_short | Biases arising from using linked administrative data for research: A conceptual framework from registration to analysis. |
title_sort | biases arising from using linked administrative data for research a conceptual framework from registration to analysis |
topic | Epidemiological biases Linkage error Data linkage Record linkage Conceptual framework |
url | https://ijpds.org/article/view/1800 |
work_keys_str_mv | AT richardshaw biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis AT katieharron biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis AT juliapescarini biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis AT elzojunior biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis AT andressasiroky biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis AT desmondcampbell biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis AT ruthdundas biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis AT mariayuryichihara biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis AT mauriciobarreto biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis AT vittalkatikireddi biasesarisingfromusinglinkedadministrativedataforresearchaconceptualframeworkfromregistrationtoanalysis |