Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model

BackgroundThe linking of administrative data across agencies provides the capability to investigate many health and social issues with the potential to deliver significant public benefit. Despite its advantages, the use of cloud computing resources for linkage purposes is scarce, with the storage of...

Full description

Bibliographic Details
Main Authors: Brown, Adrian Paul, Randall, Sean M
Format: Article
Language:English
Published: JMIR Publications 2020-09-01
Series:JMIR Medical Informatics
Online Access:http://medinform.jmir.org/2020/9/e18920/
_version_ 1818823534530527232
author Brown, Adrian Paul
Randall, Sean M
author_facet Brown, Adrian Paul
Randall, Sean M
author_sort Brown, Adrian Paul
collection DOAJ
description BackgroundThe linking of administrative data across agencies provides the capability to investigate many health and social issues with the potential to deliver significant public benefit. Despite its advantages, the use of cloud computing resources for linkage purposes is scarce, with the storage of identifiable information on cloud infrastructure assessed as high risk by data custodians. ObjectiveThis study aims to present a model for record linkage that utilizes cloud computing capabilities while assuring custodians that identifiable data sets remain secure and local. MethodsA new hybrid cloud model was developed, including privacy-preserving record linkage techniques and container-based batch processing. An evaluation of this model was conducted with a prototype implementation using large synthetic data sets representative of administrative health data. ResultsThe cloud model kept identifiers on premises and uses privacy-preserved identifiers to run all linkage computations on cloud infrastructure. Our prototype used a managed container cluster in Amazon Web Services to distribute the computation using existing linkage software. Although the cost of computation was relatively low, the use of existing software resulted in an overhead of processing of 35.7% (149/417 min execution time). ConclusionsThe result of our experimental evaluation shows the operational feasibility of such a model and the exciting opportunities for advancing the analysis of linkage outputs.
first_indexed 2024-12-18T23:41:30Z
format Article
id doaj.art-5806cbfe89384d4ea4af8b2a353d9a74
institution Directory Open Access Journal
issn 2291-9694
language English
last_indexed 2024-12-18T23:41:30Z
publishDate 2020-09-01
publisher JMIR Publications
record_format Article
series JMIR Medical Informatics
spelling doaj.art-5806cbfe89384d4ea4af8b2a353d9a742022-12-21T20:47:21ZengJMIR PublicationsJMIR Medical Informatics2291-96942020-09-0189e1892010.2196/18920Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud ModelBrown, Adrian PaulRandall, Sean MBackgroundThe linking of administrative data across agencies provides the capability to investigate many health and social issues with the potential to deliver significant public benefit. Despite its advantages, the use of cloud computing resources for linkage purposes is scarce, with the storage of identifiable information on cloud infrastructure assessed as high risk by data custodians. ObjectiveThis study aims to present a model for record linkage that utilizes cloud computing capabilities while assuring custodians that identifiable data sets remain secure and local. MethodsA new hybrid cloud model was developed, including privacy-preserving record linkage techniques and container-based batch processing. An evaluation of this model was conducted with a prototype implementation using large synthetic data sets representative of administrative health data. ResultsThe cloud model kept identifiers on premises and uses privacy-preserved identifiers to run all linkage computations on cloud infrastructure. Our prototype used a managed container cluster in Amazon Web Services to distribute the computation using existing linkage software. Although the cost of computation was relatively low, the use of existing software resulted in an overhead of processing of 35.7% (149/417 min execution time). ConclusionsThe result of our experimental evaluation shows the operational feasibility of such a model and the exciting opportunities for advancing the analysis of linkage outputs.http://medinform.jmir.org/2020/9/e18920/
spellingShingle Brown, Adrian Paul
Randall, Sean M
Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model
JMIR Medical Informatics
title Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model
title_full Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model
title_fullStr Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model
title_full_unstemmed Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model
title_short Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model
title_sort secure record linkage of large health data sets evaluation of a hybrid cloud model
url http://medinform.jmir.org/2020/9/e18920/
work_keys_str_mv AT brownadrianpaul securerecordlinkageoflargehealthdatasetsevaluationofahybridcloudmodel
AT randallseanm securerecordlinkageoflargehealthdatasetsevaluationofahybridcloudmodel