Evaluation of advanced techniques for multi-party privacy-preserving record linkage on real-world health databases
ABSTRACT Objective The linking of multiple (three or more) health databases is challenging because of the increasing sizes of databases, the number of parties among which they are to be linked, and privacy concerns related to the use of personal data such as names, addresses, or dates of birth. T...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2017-04-01
|
Series: | International Journal of Population Data Science |
Online Access: | https://ijpds.org/article/view/106 |
_version_ | 1797426866401837056 |
---|---|
author | Thilina Ranbaduge Dinusha Vatsalan Sean Randall Peter Christen |
author_facet | Thilina Ranbaduge Dinusha Vatsalan Sean Randall Peter Christen |
author_sort | Thilina Ranbaduge |
collection | DOAJ |
description | ABSTRACT
Objective
The linking of multiple (three or more) health databases is challenging because of the increasing sizes of databases, the number of parties among which they are to be linked, and privacy concerns related to the use of personal data such as names, addresses, or dates of birth. This entails a need to develop advanced scalable techniques for linking multiple databases while preserving the privacy of the individuals they contain. In this study we empirically evaluate several state-of-the-art multi-party privacy-preserving record linkage (MP-PPRL) techniques with large real-world health databases from Australia.
Approach
MP-PPRL is conducted such that no sensitive information is revealed about database records that can be used to infer knowledge about individuals or groups of individuals. Current state-of-the-art methods used in this evaluation use Bloom filters to encode personal identifying information. The empirical evaluation comprises of different multi-party private blocking and matching techniques that are evaluated for different numbers of parties. Each database contains more than 700,000 records extracted from ten years of New South Wales (NSW) emergency presentation data. Each technique is evaluated with regard to scalability, quality and privacy. Scalability and quality are measured using the metrics of reduction ratio, pairs completeness, precision, recall, and F-measure. Privacy is measured using disclosure risk metrics that are based on the probability of suspicion, defined as the likelihood that a record in an encoded database matches to one or more record(s) in a publicly available database such as a telephone directory. MP-PPRL techniques that either utilize a trusted linkage unit, and those that do not, are evaluated.
Results
Experimental results showed MP-PPRL methods are practical for linking large-scale real world data. Private blocking techniques achieved significantly higher privacy than standard hashing-based techniques with a maximum disclosure risk of 0.0003 and 1, respectively, at a small cost to linkage quality and efficiency. Similarly, private matching techniques provided a similar acceptable reduction in linkage quality compared to standard non-private matching while providing high privacy protection.
Conclusion
The adoption of privacy-preserving linkage methods has the ability to significantly reduce privacy risks associated with linking large health databases, and enable the data linkage community to offer operational linkage services not previously possible. The evaluation results show that these state-of-the-art MP-PPRL techniques are scalable in terms of database sizes and number of parties, while providing significantly improved privacy with an associated trade-off in linkage quality compared to standard linkage techniques. |
first_indexed | 2024-03-09T08:36:03Z |
format | Article |
id | doaj.art-8358b02c512f4578b75a7110c6d2308f |
institution | Directory Open Access Journal |
issn | 2399-4908 |
language | English |
last_indexed | 2024-03-09T08:36:03Z |
publishDate | 2017-04-01 |
publisher | Swansea University |
record_format | Article |
series | International Journal of Population Data Science |
spelling | doaj.art-8358b02c512f4578b75a7110c6d2308f2023-12-02T18:10:45ZengSwansea UniversityInternational Journal of Population Data Science2399-49082017-04-011110.23889/ijpds.v1i1.106106Evaluation of advanced techniques for multi-party privacy-preserving record linkage on real-world health databasesThilina Ranbaduge0Dinusha Vatsalan1Sean Randall2Peter Christen3Research School of Computer Science, The Australian National UniversityResearch School of Computer Science, The Australian National UniversityCentre for Data Linkage, Curtin UniversityResearch School of Computer Science, The Australian National UniversityABSTRACT Objective The linking of multiple (three or more) health databases is challenging because of the increasing sizes of databases, the number of parties among which they are to be linked, and privacy concerns related to the use of personal data such as names, addresses, or dates of birth. This entails a need to develop advanced scalable techniques for linking multiple databases while preserving the privacy of the individuals they contain. In this study we empirically evaluate several state-of-the-art multi-party privacy-preserving record linkage (MP-PPRL) techniques with large real-world health databases from Australia. Approach MP-PPRL is conducted such that no sensitive information is revealed about database records that can be used to infer knowledge about individuals or groups of individuals. Current state-of-the-art methods used in this evaluation use Bloom filters to encode personal identifying information. The empirical evaluation comprises of different multi-party private blocking and matching techniques that are evaluated for different numbers of parties. Each database contains more than 700,000 records extracted from ten years of New South Wales (NSW) emergency presentation data. Each technique is evaluated with regard to scalability, quality and privacy. Scalability and quality are measured using the metrics of reduction ratio, pairs completeness, precision, recall, and F-measure. Privacy is measured using disclosure risk metrics that are based on the probability of suspicion, defined as the likelihood that a record in an encoded database matches to one or more record(s) in a publicly available database such as a telephone directory. MP-PPRL techniques that either utilize a trusted linkage unit, and those that do not, are evaluated. Results Experimental results showed MP-PPRL methods are practical for linking large-scale real world data. Private blocking techniques achieved significantly higher privacy than standard hashing-based techniques with a maximum disclosure risk of 0.0003 and 1, respectively, at a small cost to linkage quality and efficiency. Similarly, private matching techniques provided a similar acceptable reduction in linkage quality compared to standard non-private matching while providing high privacy protection. Conclusion The adoption of privacy-preserving linkage methods has the ability to significantly reduce privacy risks associated with linking large health databases, and enable the data linkage community to offer operational linkage services not previously possible. The evaluation results show that these state-of-the-art MP-PPRL techniques are scalable in terms of database sizes and number of parties, while providing significantly improved privacy with an associated trade-off in linkage quality compared to standard linkage techniques.https://ijpds.org/article/view/106 |
spellingShingle | Thilina Ranbaduge Dinusha Vatsalan Sean Randall Peter Christen Evaluation of advanced techniques for multi-party privacy-preserving record linkage on real-world health databases International Journal of Population Data Science |
title | Evaluation of advanced techniques for multi-party privacy-preserving record linkage on real-world health databases |
title_full | Evaluation of advanced techniques for multi-party privacy-preserving record linkage on real-world health databases |
title_fullStr | Evaluation of advanced techniques for multi-party privacy-preserving record linkage on real-world health databases |
title_full_unstemmed | Evaluation of advanced techniques for multi-party privacy-preserving record linkage on real-world health databases |
title_short | Evaluation of advanced techniques for multi-party privacy-preserving record linkage on real-world health databases |
title_sort | evaluation of advanced techniques for multi party privacy preserving record linkage on real world health databases |
url | https://ijpds.org/article/view/106 |
work_keys_str_mv | AT thilinaranbaduge evaluationofadvancedtechniquesformultipartyprivacypreservingrecordlinkageonrealworldhealthdatabases AT dinushavatsalan evaluationofadvancedtechniquesformultipartyprivacypreservingrecordlinkageonrealworldhealthdatabases AT seanrandall evaluationofadvancedtechniquesformultipartyprivacypreservingrecordlinkageonrealworldhealthdatabases AT peterchristen evaluationofadvancedtechniquesformultipartyprivacypreservingrecordlinkageonrealworldhealthdatabases |