Public Cloud: The Future of Record Linkage?

Introduction Businesses worldwide are increasingly adopting the storage, compute and analytical services provided by cloud computing. Yet, few operational linkage units are keeping pace with this world of technological change - most use legacy systems approaching their limits with the rapidly increa...

Full description

Bibliographic Details
Main Authors: Adrian Brown, Sean Randall, Anna Ferrante, James Boyd
Format: Article
Language:English
Published: Swansea University 2018-09-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/885
_version_ 1797430775607459840
author Adrian Brown
Sean Randall
Anna Ferrante
James Boyd
author_facet Adrian Brown
Sean Randall
Anna Ferrante
James Boyd
author_sort Adrian Brown
collection DOAJ
description Introduction Businesses worldwide are increasingly adopting the storage, compute and analytical services provided by cloud computing. Yet, few operational linkage units are keeping pace with this world of technological change - most use legacy systems approaching their limits with the rapidly increasing size and range of datasets now required for linkage. Objectives and Approach To meet the demands of linkage for the near future, it is important that new solutions for linkage consider the services provided by public cloud infrastructure for compute, storage and analytics. We examined Platform as a Service (PaaS) offerings for use in the development of a cost-effective cloud model for scalable, privacy-preserving record linkage (PPRL). PPRL techniques were adapted to maximise the quality of linkage and to automate as much of the process as possible. Finally, a prototype was created to demonstrate the capabilities and potential of the model. Results We present our cloud model for PPRL, a platform for record linkage that provides rapid scaling of resources to meet demand, and the results of how our prototype performed on massive datasets. Conclusion/Implications The application of record linkage using relatively inexpensive cloud infrastructure represents a significant step towards providing an efficient and scalable record linkage service to researchers and government. Larger datasets can be linked efficiently, including national or cross-jurisdictional datasets, with little investment in private infrastructure, and improved turnaround times for researchers.
first_indexed 2024-03-09T09:33:24Z
format Article
id doaj.art-ea524b293ca3428f9fc8810aff6401d8
institution Directory Open Access Journal
issn 2399-4908
language English
last_indexed 2024-03-09T09:33:24Z
publishDate 2018-09-01
publisher Swansea University
record_format Article
series International Journal of Population Data Science
spelling doaj.art-ea524b293ca3428f9fc8810aff6401d82023-12-02T03:10:21ZengSwansea UniversityInternational Journal of Population Data Science2399-49082018-09-013410.23889/ijpds.v3i4.885885Public Cloud: The Future of Record Linkage?Adrian Brown0Sean Randall1Anna Ferrante2James Boyd3Curtin UniversityCurtin UniversityCurtin UniversityCurtin UniversityIntroduction Businesses worldwide are increasingly adopting the storage, compute and analytical services provided by cloud computing. Yet, few operational linkage units are keeping pace with this world of technological change - most use legacy systems approaching their limits with the rapidly increasing size and range of datasets now required for linkage. Objectives and Approach To meet the demands of linkage for the near future, it is important that new solutions for linkage consider the services provided by public cloud infrastructure for compute, storage and analytics. We examined Platform as a Service (PaaS) offerings for use in the development of a cost-effective cloud model for scalable, privacy-preserving record linkage (PPRL). PPRL techniques were adapted to maximise the quality of linkage and to automate as much of the process as possible. Finally, a prototype was created to demonstrate the capabilities and potential of the model. Results We present our cloud model for PPRL, a platform for record linkage that provides rapid scaling of resources to meet demand, and the results of how our prototype performed on massive datasets. Conclusion/Implications The application of record linkage using relatively inexpensive cloud infrastructure represents a significant step towards providing an efficient and scalable record linkage service to researchers and government. Larger datasets can be linked efficiently, including national or cross-jurisdictional datasets, with little investment in private infrastructure, and improved turnaround times for researchers.https://ijpds.org/article/view/885
spellingShingle Adrian Brown
Sean Randall
Anna Ferrante
James Boyd
Public Cloud: The Future of Record Linkage?
International Journal of Population Data Science
title Public Cloud: The Future of Record Linkage?
title_full Public Cloud: The Future of Record Linkage?
title_fullStr Public Cloud: The Future of Record Linkage?
title_full_unstemmed Public Cloud: The Future of Record Linkage?
title_short Public Cloud: The Future of Record Linkage?
title_sort public cloud the future of record linkage
url https://ijpds.org/article/view/885
work_keys_str_mv AT adrianbrown publiccloudthefutureofrecordlinkage
AT seanrandall publiccloudthefutureofrecordlinkage
AT annaferrante publiccloudthefutureofrecordlinkage
AT jamesboyd publiccloudthefutureofrecordlinkage