Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study

<p>Abstract</p> <p>Background</p> <p>In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a...

Full description

Bibliographic Details
Main Authors: Gibson Diane, Anderson Phil, Karmel Rosemary, Peut Ann, Duckett Stephen, Wells Yvonne
Format: Article
Language:English
Published: BMC 2010-02-01
Series:BMC Health Services Research
Online Access:http://www.biomedcentral.com/1472-6963/10/41
_version_ 1818187480808030208
author Gibson Diane
Anderson Phil
Karmel Rosemary
Peut Ann
Duckett Stephen
Wells Yvonne
author_facet Gibson Diane
Anderson Phil
Karmel Rosemary
Peut Ann
Duckett Stephen
Wells Yvonne
author_sort Gibson Diane
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a common SLK is now used in many collections to facilitate the statistical examination of cross-program use. In 2005, the Pathways in Aged Care (PIAC) cohort study was funded to create a linked aged care database using the common SLK to enable analysis of pathways through aged care services.</p> <p>Linkage using an SLK is commonly deterministic. The purpose of this paper is to describe an extended deterministic record linkage strategy for situations where there is a general person identifier (e.g. an SLK) and several additional variables suitable for data linkage. This approach can allow for variation in client information recorded on different databases.</p> <p>Methods</p> <p>A stepwise deterministic record linkage algorithm was developed to link datasets using an SLK and several other variables. Three measures of likely match accuracy were used: the discriminating power of match key values, an estimated false match rate, and an estimated step-specific trade-off between true and false matches. The method was validated through examining link properties and clerical review of three samples of links.</p> <p>Results</p> <p>The deterministic algorithm resulted in up to an 11% increase in links compared with simple deterministic matching using an SLK. The links identified are of high quality: validation samples showed that less than 0.5% of links were false positives, and very few matches were made using non-unique match information (0.01%). There was a high degree of consistency in the characteristics of linked events.</p> <p>Conclusions</p> <p>The linkage strategy described in this paper has allowed the linking of multiple large aged care service datasets using a statistical linkage key while allowing for variation in its reporting. More widely, our deterministic algorithm, based on statistical properties of match keys, is a useful addition to the linker's toolkit. In particular, it may prove attractive when insufficient data are available for clerical review or follow-up, and the researcher has fewer options in relation to probabilistic linkage.</p>
first_indexed 2024-12-11T23:11:42Z
format Article
id doaj.art-5281e761a89c42aea1402697c82fdff2
institution Directory Open Access Journal
issn 1472-6963
language English
last_indexed 2024-12-11T23:11:42Z
publishDate 2010-02-01
publisher BMC
record_format Article
series BMC Health Services Research
spelling doaj.art-5281e761a89c42aea1402697c82fdff22022-12-22T00:46:40ZengBMCBMC Health Services Research1472-69632010-02-011014110.1186/1472-6963-10-41Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort studyGibson DianeAnderson PhilKarmel RosemaryPeut AnnDuckett StephenWells Yvonne<p>Abstract</p> <p>Background</p> <p>In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a common SLK is now used in many collections to facilitate the statistical examination of cross-program use. In 2005, the Pathways in Aged Care (PIAC) cohort study was funded to create a linked aged care database using the common SLK to enable analysis of pathways through aged care services.</p> <p>Linkage using an SLK is commonly deterministic. The purpose of this paper is to describe an extended deterministic record linkage strategy for situations where there is a general person identifier (e.g. an SLK) and several additional variables suitable for data linkage. This approach can allow for variation in client information recorded on different databases.</p> <p>Methods</p> <p>A stepwise deterministic record linkage algorithm was developed to link datasets using an SLK and several other variables. Three measures of likely match accuracy were used: the discriminating power of match key values, an estimated false match rate, and an estimated step-specific trade-off between true and false matches. The method was validated through examining link properties and clerical review of three samples of links.</p> <p>Results</p> <p>The deterministic algorithm resulted in up to an 11% increase in links compared with simple deterministic matching using an SLK. The links identified are of high quality: validation samples showed that less than 0.5% of links were false positives, and very few matches were made using non-unique match information (0.01%). There was a high degree of consistency in the characteristics of linked events.</p> <p>Conclusions</p> <p>The linkage strategy described in this paper has allowed the linking of multiple large aged care service datasets using a statistical linkage key while allowing for variation in its reporting. More widely, our deterministic algorithm, based on statistical properties of match keys, is a useful addition to the linker's toolkit. In particular, it may prove attractive when insufficient data are available for clerical review or follow-up, and the researcher has fewer options in relation to probabilistic linkage.</p>http://www.biomedcentral.com/1472-6963/10/41
spellingShingle Gibson Diane
Anderson Phil
Karmel Rosemary
Peut Ann
Duckett Stephen
Wells Yvonne
Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
BMC Health Services Research
title Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
title_full Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
title_fullStr Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
title_full_unstemmed Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
title_short Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
title_sort empirical aspects of record linkage across multiple data sets using statistical linkage keys the experience of the piac cohort study
url http://www.biomedcentral.com/1472-6963/10/41
work_keys_str_mv AT gibsondiane empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy
AT andersonphil empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy
AT karmelrosemary empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy
AT peutann empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy
AT duckettstephen empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy
AT wellsyvonne empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy