Survival analysis under imperfect record linkage using historic census data
Abstract Background Advancements in linking publicly available census records with vital and administrative records have enabled novel investigations in epidemiology and social history. However, in the absence of unique identifiers, the linkage of the records may be uncertain or only be successful f...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2024-03-01
|
Series: | BMC Medical Research Methodology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12874-024-02194-6 |
_version_ | 1827315918924939264 |
---|---|
author | Arielle K. Marks-Anglin Frances K. Barg Michelle Ross Douglas J. Wiebe Wei-Ting Hwang |
author_facet | Arielle K. Marks-Anglin Frances K. Barg Michelle Ross Douglas J. Wiebe Wei-Ting Hwang |
author_sort | Arielle K. Marks-Anglin |
collection | DOAJ |
description | Abstract Background Advancements in linking publicly available census records with vital and administrative records have enabled novel investigations in epidemiology and social history. However, in the absence of unique identifiers, the linkage of the records may be uncertain or only be successful for a subset of the census cohort, resulting in missing data. For survival analysis, differential ascertainment of event times can impact inference on risk associations and median survival. Methods We modify some existing approaches that are commonly used to handle missing survival times to accommodate this imperfect linkage situation including complete case analysis, censoring, weighting, and several multiple imputation methods. We then conduct simulation studies to compare the performance of the proposed approaches in estimating the associations of a risk factor or exposure in terms of hazard ratio (HR) and median survival times in the presence of missing survival times. The effects of different missing data mechanisms and exposure-survival associations on their performance are also explored. The approaches are applied to a historic cohort of residents in Ambler, PA, established using the 1930 US census, from which only 2,440 out of 4,514 individuals (54%) had death records retrievable from publicly available data sources and death certificates. Using this cohort, we examine the effects of occupational and paraoccupational asbestos exposure on survival and disparities in mortality by race and gender. Results We show that imputation based on conditional survival results in less bias and greater efficiency relative to a complete case analysis when estimating log-hazard ratios and median survival times. When the approaches are applied to the Ambler cohort, we find a significant association between occupational exposure and mortality, particularly among black individuals and males, but not between paraoccupational exposure and mortality. Discussion This investigation illustrates the strengths and weaknesses of different imputation methods for missing survival times due to imperfect linkage of the administrative or registry data. The performance of the methods may depend on the missingness process as well as the parameter being estimated and models of interest, and such factors should be considered when choosing the methods to address the missing event times. |
first_indexed | 2024-04-24T23:06:07Z |
format | Article |
id | doaj.art-954b1a4a51064e18b0007f8645923faf |
institution | Directory Open Access Journal |
issn | 1471-2288 |
language | English |
last_indexed | 2024-04-24T23:06:07Z |
publishDate | 2024-03-01 |
publisher | BMC |
record_format | Article |
series | BMC Medical Research Methodology |
spelling | doaj.art-954b1a4a51064e18b0007f8645923faf2024-03-17T12:29:47ZengBMCBMC Medical Research Methodology1471-22882024-03-0124111610.1186/s12874-024-02194-6Survival analysis under imperfect record linkage using historic census dataArielle K. Marks-Anglin0Frances K. Barg1Michelle Ross2Douglas J. Wiebe3Wei-Ting Hwang4Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of PennsylvaniaDepartment of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of PennsylvaniaDepartment of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of PennsylvaniaDepartment of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of PennsylvaniaDepartment of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of PennsylvaniaAbstract Background Advancements in linking publicly available census records with vital and administrative records have enabled novel investigations in epidemiology and social history. However, in the absence of unique identifiers, the linkage of the records may be uncertain or only be successful for a subset of the census cohort, resulting in missing data. For survival analysis, differential ascertainment of event times can impact inference on risk associations and median survival. Methods We modify some existing approaches that are commonly used to handle missing survival times to accommodate this imperfect linkage situation including complete case analysis, censoring, weighting, and several multiple imputation methods. We then conduct simulation studies to compare the performance of the proposed approaches in estimating the associations of a risk factor or exposure in terms of hazard ratio (HR) and median survival times in the presence of missing survival times. The effects of different missing data mechanisms and exposure-survival associations on their performance are also explored. The approaches are applied to a historic cohort of residents in Ambler, PA, established using the 1930 US census, from which only 2,440 out of 4,514 individuals (54%) had death records retrievable from publicly available data sources and death certificates. Using this cohort, we examine the effects of occupational and paraoccupational asbestos exposure on survival and disparities in mortality by race and gender. Results We show that imputation based on conditional survival results in less bias and greater efficiency relative to a complete case analysis when estimating log-hazard ratios and median survival times. When the approaches are applied to the Ambler cohort, we find a significant association between occupational exposure and mortality, particularly among black individuals and males, but not between paraoccupational exposure and mortality. Discussion This investigation illustrates the strengths and weaknesses of different imputation methods for missing survival times due to imperfect linkage of the administrative or registry data. The performance of the methods may depend on the missingness process as well as the parameter being estimated and models of interest, and such factors should be considered when choosing the methods to address the missing event times.https://doi.org/10.1186/s12874-024-02194-6Census dataCensoringMissing dataRecord linkageSurvival analysis |
spellingShingle | Arielle K. Marks-Anglin Frances K. Barg Michelle Ross Douglas J. Wiebe Wei-Ting Hwang Survival analysis under imperfect record linkage using historic census data BMC Medical Research Methodology Census data Censoring Missing data Record linkage Survival analysis |
title | Survival analysis under imperfect record linkage using historic census data |
title_full | Survival analysis under imperfect record linkage using historic census data |
title_fullStr | Survival analysis under imperfect record linkage using historic census data |
title_full_unstemmed | Survival analysis under imperfect record linkage using historic census data |
title_short | Survival analysis under imperfect record linkage using historic census data |
title_sort | survival analysis under imperfect record linkage using historic census data |
topic | Census data Censoring Missing data Record linkage Survival analysis |
url | https://doi.org/10.1186/s12874-024-02194-6 |
work_keys_str_mv | AT ariellekmarksanglin survivalanalysisunderimperfectrecordlinkageusinghistoriccensusdata AT franceskbarg survivalanalysisunderimperfectrecordlinkageusinghistoriccensusdata AT michelleross survivalanalysisunderimperfectrecordlinkageusinghistoriccensusdata AT douglasjwiebe survivalanalysisunderimperfectrecordlinkageusinghistoriccensusdata AT weitinghwang survivalanalysisunderimperfectrecordlinkageusinghistoriccensusdata |