Securely measuring the overlap between private datasets with cryptosets.

Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be free...

Full description

Bibliographic Details
Main Authors: S Joshua Swamidass, Matthew Matlock, Leon Rozenblit
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2015-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4340911?pdf=render
_version_ 1811288876446121984
author S Joshua Swamidass
Matthew Matlock
Leon Rozenblit
author_facet S Joshua Swamidass
Matthew Matlock
Leon Rozenblit
author_sort S Joshua Swamidass
collection DOAJ
description Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.
first_indexed 2024-04-13T03:45:07Z
format Article
id doaj.art-d1bd67193416413ea992d386439dac77
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-13T03:45:07Z
publishDate 2015-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-d1bd67193416413ea992d386439dac772022-12-22T03:04:03ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01102e011789810.1371/journal.pone.0117898Securely measuring the overlap between private datasets with cryptosets.S Joshua SwamidassMatthew MatlockLeon RozenblitMany scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.http://europepmc.org/articles/PMC4340911?pdf=render
spellingShingle S Joshua Swamidass
Matthew Matlock
Leon Rozenblit
Securely measuring the overlap between private datasets with cryptosets.
PLoS ONE
title Securely measuring the overlap between private datasets with cryptosets.
title_full Securely measuring the overlap between private datasets with cryptosets.
title_fullStr Securely measuring the overlap between private datasets with cryptosets.
title_full_unstemmed Securely measuring the overlap between private datasets with cryptosets.
title_short Securely measuring the overlap between private datasets with cryptosets.
title_sort securely measuring the overlap between private datasets with cryptosets
url http://europepmc.org/articles/PMC4340911?pdf=render
work_keys_str_mv AT sjoshuaswamidass securelymeasuringtheoverlapbetweenprivatedatasetswithcryptosets
AT matthewmatlock securelymeasuringtheoverlapbetweenprivatedatasetswithcryptosets
AT leonrozenblit securelymeasuringtheoverlapbetweenprivatedatasetswithcryptosets