Regression analysis with linked data: problems and possible solutions

In this paper we have described and extended some recent proposals on a general Bayesian methodology for performing record linkage and making inference using  the resulting matched units. In particular, we have framed the record linkage process into a formal statistical model which  comprises both t...

Full description

Bibliographic Details
Main Authors: Andrea Tancredi, Brunero Liseo
Format: Article
Language:English
Published: University of Bologna 2015-03-01
Series:Statistica
Subjects:
Online Access:http://rivista-statistica.unibo.it/article/view/5821
_version_ 1819265008910991360
author Andrea Tancredi
Brunero Liseo
author_facet Andrea Tancredi
Brunero Liseo
author_sort Andrea Tancredi
collection DOAJ
description In this paper we have described and extended some recent proposals on a general Bayesian methodology for performing record linkage and making inference using  the resulting matched units. In particular, we have framed the record linkage process into a formal statistical model which  comprises both the matching variables and the other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of the information between the working statistical model and the record linkage stage.We have argued that this feedback effect is both  essential to eliminate potential biases that otherwise would characterize the resulting linked data inference, and able to improve record linkage performances. The practical implementation of the procedure is based on the use of standard Bayesian computational techniques, such as Markov Chain Monte Carlo algorithms. Although the methodology is quite general, we have restricted our analysis to the popular and important case of  multiple linear regression set-up for expository convenience.
first_indexed 2024-12-23T20:38:33Z
format Article
id doaj.art-bf65f1f18258467fbca6a50bfc9a9a9d
institution Directory Open Access Journal
issn 0390-590X
1973-2201
language English
last_indexed 2024-12-23T20:38:33Z
publishDate 2015-03-01
publisher University of Bologna
record_format Article
series Statistica
spelling doaj.art-bf65f1f18258467fbca6a50bfc9a9a9d2022-12-21T17:32:00ZengUniversity of BolognaStatistica0390-590X1973-22012015-03-01751193510.6092/issn.1973-2201/58215309Regression analysis with linked data: problems and possible solutionsAndrea Tancredi0Brunero Liseo1Università di Roma “La Sapienza”Università di Roma “La Sapienza”In this paper we have described and extended some recent proposals on a general Bayesian methodology for performing record linkage and making inference using  the resulting matched units. In particular, we have framed the record linkage process into a formal statistical model which  comprises both the matching variables and the other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of the information between the working statistical model and the record linkage stage.We have argued that this feedback effect is both  essential to eliminate potential biases that otherwise would characterize the resulting linked data inference, and able to improve record linkage performances. The practical implementation of the procedure is based on the use of standard Bayesian computational techniques, such as Markov Chain Monte Carlo algorithms. Although the methodology is quite general, we have restricted our analysis to the popular and important case of  multiple linear regression set-up for expository convenience.http://rivista-statistica.unibo.it/article/view/5821Bayesian regressionHit-miss modelMetropolis-Hastings algorithmRecord linkage
spellingShingle Andrea Tancredi
Brunero Liseo
Regression analysis with linked data: problems and possible solutions
Statistica
Bayesian regression
Hit-miss model
Metropolis-Hastings algorithm
Record linkage
title Regression analysis with linked data: problems and possible solutions
title_full Regression analysis with linked data: problems and possible solutions
title_fullStr Regression analysis with linked data: problems and possible solutions
title_full_unstemmed Regression analysis with linked data: problems and possible solutions
title_short Regression analysis with linked data: problems and possible solutions
title_sort regression analysis with linked data problems and possible solutions
topic Bayesian regression
Hit-miss model
Metropolis-Hastings algorithm
Record linkage
url http://rivista-statistica.unibo.it/article/view/5821
work_keys_str_mv AT andreatancredi regressionanalysiswithlinkeddataproblemsandpossiblesolutions
AT bruneroliseo regressionanalysiswithlinkeddataproblemsandpossiblesolutions