On the Asymptotic Capacity of Information-Theoretic Privacy-Preserving Epidemiological Data Collection

The paradigm-shifting developments of cryptography and information theory have focused on the privacy of data-sharing systems, such as epidemiological studies, where agencies are collecting far more personal data than they need, causing intrusions on patients’ privacy. To study the capability of the...

Full description

Bibliographic Details
Main Authors: Jiale Cheng, Nan Liu, Wei Kang
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/25/4/625
_version_ 1797605592727027712
author Jiale Cheng
Nan Liu
Wei Kang
author_facet Jiale Cheng
Nan Liu
Wei Kang
author_sort Jiale Cheng
collection DOAJ
description The paradigm-shifting developments of cryptography and information theory have focused on the privacy of data-sharing systems, such as epidemiological studies, where agencies are collecting far more personal data than they need, causing intrusions on patients’ privacy. To study the capability of the data collection while protecting privacy from an information theory perspective, we formulate a new distributed multiparty computation problem called privacy-preserving epidemiological data collection. In our setting, a data collector requires a linear combination of <i>K</i> users’ data through a storage system consisting of <i>N</i> servers. Privacy needs to be protected when the users, servers, and data collector do not trust each other. For the users, any data are required to be protected from up to <i>E</i> colluding servers; for the servers, any more information than the desired linear combination cannot be leaked to the data collector; and for the data collector, any single server can not know anything about the coefficients of the linear combination. Our goal is to find the optimal collection rate, which is defined as the ratio of the size of the user’s message to the total size of downloads from <i>N</i> servers to the data collector. For achievability, we propose an asymptotic capacity-achieving scheme when <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mo><</mo><mi>N</mi><mo>−</mo><mn>1</mn></mrow></semantics></math></inline-formula>, by applying the cross-subspace alignment method to our construction; for the converse, we proved an upper bound of the asymptotic rate for all achievable schemes when <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mo><</mo><mi>N</mi><mo>−</mo><mn>1</mn></mrow></semantics></math></inline-formula>. Additionally, we show that a positive asymptotic capacity is not possible when <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mo>≥</mo><mi>N</mi><mo>−</mo><mn>1</mn></mrow></semantics></math></inline-formula>. The results of the achievability and converse meet when the number of users goes to infinity, yielding the asymptotic capacity. Our work broadens current researches on data privacy in information theory and gives the best achievable asymptotic performance that any epidemiological data collector can obtain.
first_indexed 2024-03-11T05:03:16Z
format Article
id doaj.art-941e7bc0b1f34119859f530a424531e2
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-11T05:03:16Z
publishDate 2023-04-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-941e7bc0b1f34119859f530a424531e22023-11-17T19:08:44ZengMDPI AGEntropy1099-43002023-04-0125462510.3390/e25040625On the Asymptotic Capacity of Information-Theoretic Privacy-Preserving Epidemiological Data CollectionJiale Cheng0Nan Liu1Wei Kang2National Mobile Communications Research Laboratory, Southeast University, Nanjing 211189, ChinaNational Mobile Communications Research Laboratory, Southeast University, Nanjing 211189, ChinaSchool of Information Science and Engineering, Southeast University, Nanjing 211189, ChinaThe paradigm-shifting developments of cryptography and information theory have focused on the privacy of data-sharing systems, such as epidemiological studies, where agencies are collecting far more personal data than they need, causing intrusions on patients’ privacy. To study the capability of the data collection while protecting privacy from an information theory perspective, we formulate a new distributed multiparty computation problem called privacy-preserving epidemiological data collection. In our setting, a data collector requires a linear combination of <i>K</i> users’ data through a storage system consisting of <i>N</i> servers. Privacy needs to be protected when the users, servers, and data collector do not trust each other. For the users, any data are required to be protected from up to <i>E</i> colluding servers; for the servers, any more information than the desired linear combination cannot be leaked to the data collector; and for the data collector, any single server can not know anything about the coefficients of the linear combination. Our goal is to find the optimal collection rate, which is defined as the ratio of the size of the user’s message to the total size of downloads from <i>N</i> servers to the data collector. For achievability, we propose an asymptotic capacity-achieving scheme when <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mo><</mo><mi>N</mi><mo>−</mo><mn>1</mn></mrow></semantics></math></inline-formula>, by applying the cross-subspace alignment method to our construction; for the converse, we proved an upper bound of the asymptotic rate for all achievable schemes when <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mo><</mo><mi>N</mi><mo>−</mo><mn>1</mn></mrow></semantics></math></inline-formula>. Additionally, we show that a positive asymptotic capacity is not possible when <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mo>≥</mo><mi>N</mi><mo>−</mo><mn>1</mn></mrow></semantics></math></inline-formula>. The results of the achievability and converse meet when the number of users goes to infinity, yielding the asymptotic capacity. Our work broadens current researches on data privacy in information theory and gives the best achievable asymptotic performance that any epidemiological data collector can obtain.https://www.mdpi.com/1099-4300/25/4/625secure multiparty computationdata privacyepidemiological data collectionasymptotic capacity
spellingShingle Jiale Cheng
Nan Liu
Wei Kang
On the Asymptotic Capacity of Information-Theoretic Privacy-Preserving Epidemiological Data Collection
Entropy
secure multiparty computation
data privacy
epidemiological data collection
asymptotic capacity
title On the Asymptotic Capacity of Information-Theoretic Privacy-Preserving Epidemiological Data Collection
title_full On the Asymptotic Capacity of Information-Theoretic Privacy-Preserving Epidemiological Data Collection
title_fullStr On the Asymptotic Capacity of Information-Theoretic Privacy-Preserving Epidemiological Data Collection
title_full_unstemmed On the Asymptotic Capacity of Information-Theoretic Privacy-Preserving Epidemiological Data Collection
title_short On the Asymptotic Capacity of Information-Theoretic Privacy-Preserving Epidemiological Data Collection
title_sort on the asymptotic capacity of information theoretic privacy preserving epidemiological data collection
topic secure multiparty computation
data privacy
epidemiological data collection
asymptotic capacity
url https://www.mdpi.com/1099-4300/25/4/625
work_keys_str_mv AT jialecheng ontheasymptoticcapacityofinformationtheoreticprivacypreservingepidemiologicaldatacollection
AT nanliu ontheasymptoticcapacityofinformationtheoreticprivacypreservingepidemiologicaldatacollection
AT weikang ontheasymptoticcapacityofinformationtheoreticprivacypreservingepidemiologicaldatacollection