Integrating historical noisy answers for improving data utility under differential privacy

Differential privacy is a robust principle for privacy preserving data analysis tasks, and has been successfully applied to a variety of applications. However, the number of queries that can be answered is limited for preventing privacy disclosure. Once the privacy budget is exhausted, all succeedin...

Full description

Bibliographic Details
Main Authors: Bhowmick, Sourav S., Chen, Shixi, Zhou, Shuigeng
Other Authors: School of Computer Engineering
Format: Conference Paper
Language:English
Published: 2013
Online Access:https://hdl.handle.net/10356/84235
http://hdl.handle.net/10220/12280
_version_ 1826129255761707008
author Bhowmick, Sourav S.
Chen, Shixi
Zhou, Shuigeng
author2 School of Computer Engineering
author_facet School of Computer Engineering
Bhowmick, Sourav S.
Chen, Shixi
Zhou, Shuigeng
author_sort Bhowmick, Sourav S.
collection NTU
description Differential privacy is a robust principle for privacy preserving data analysis tasks, and has been successfully applied to a variety of applications. However, the number of queries that can be answered is limited for preventing privacy disclosure. Once the privacy budget is exhausted, all succeeding queries must be rejected. Therefore, each of the historical query answers is valuable and it is important to exploit them together to learn more about the data. We propose to integrate all available linear query answers into a consistent form that embodies our knowledge learned from the noisy answers, obtaining more accurate answers to past queries and even new queries, improving the data utility. Two distinct approaches are developed for this purpose, one via principle component analysis, and another via maximum entropy method. The second approach also generates a synthetic database, which is useful for differentially private data publishing. One important goal of our work is to ensure that the running time of our approaches does not grow with the cardinality of the universe of a data tuple, so that high-dimensional data with very large domain can still be tackled efficiently.
first_indexed 2024-10-01T07:37:40Z
format Conference Paper
id ntu-10356/84235
institution Nanyang Technological University
language English
last_indexed 2024-10-01T07:37:40Z
publishDate 2013
record_format dspace
spelling ntu-10356/842352020-05-28T07:18:00Z Integrating historical noisy answers for improving data utility under differential privacy Bhowmick, Sourav S. Chen, Shixi Zhou, Shuigeng School of Computer Engineering International Conference on Extending Database Technology (15th : 2012) Differential privacy is a robust principle for privacy preserving data analysis tasks, and has been successfully applied to a variety of applications. However, the number of queries that can be answered is limited for preventing privacy disclosure. Once the privacy budget is exhausted, all succeeding queries must be rejected. Therefore, each of the historical query answers is valuable and it is important to exploit them together to learn more about the data. We propose to integrate all available linear query answers into a consistent form that embodies our knowledge learned from the noisy answers, obtaining more accurate answers to past queries and even new queries, improving the data utility. Two distinct approaches are developed for this purpose, one via principle component analysis, and another via maximum entropy method. The second approach also generates a synthetic database, which is useful for differentially private data publishing. One important goal of our work is to ensure that the running time of our approaches does not grow with the cardinality of the universe of a data tuple, so that high-dimensional data with very large domain can still be tackled efficiently. 2013-07-25T07:41:43Z 2019-12-06T15:41:05Z 2013-07-25T07:41:43Z 2019-12-06T15:41:05Z 2012 2012 Conference Paper Chen, S., Zhou, S., & Bhowmick, S. S. (2012). Integrating historical noisy answers for improving data utility under differential privacy. Proceedings of the 15th International Conference on Extending Database Technology. https://hdl.handle.net/10356/84235 http://hdl.handle.net/10220/12280 10.1145/2247596.2247605 en © 2012 ACM.
spellingShingle Bhowmick, Sourav S.
Chen, Shixi
Zhou, Shuigeng
Integrating historical noisy answers for improving data utility under differential privacy
title Integrating historical noisy answers for improving data utility under differential privacy
title_full Integrating historical noisy answers for improving data utility under differential privacy
title_fullStr Integrating historical noisy answers for improving data utility under differential privacy
title_full_unstemmed Integrating historical noisy answers for improving data utility under differential privacy
title_short Integrating historical noisy answers for improving data utility under differential privacy
title_sort integrating historical noisy answers for improving data utility under differential privacy
url https://hdl.handle.net/10356/84235
http://hdl.handle.net/10220/12280
work_keys_str_mv AT bhowmicksouravs integratinghistoricalnoisyanswersforimprovingdatautilityunderdifferentialprivacy
AT chenshixi integratinghistoricalnoisyanswersforimprovingdatautilityunderdifferentialprivacy
AT zhoushuigeng integratinghistoricalnoisyanswersforimprovingdatautilityunderdifferentialprivacy