A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms
Differential privacy has emerged as a popular model to provably limit privacy risks associated with a given data release. However releasing high dimensional synthetic data under differential privacy remains a challenging problem. In this paper, we study the problem of releasing synthetic data in the...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Labor Dynamics Institute
2018-12-01
|
Series: | The Journal of Privacy and Confidentiality |
Subjects: | |
Online Access: | https://journalprivacyconfidentiality.org/index.php/jpc/article/view/657 |
_version_ | 1818147553002127360 |
---|---|
author | Bai Li Vishesh Karwa Aleksandra Slavković Rebecca Carter Steorts |
author_facet | Bai Li Vishesh Karwa Aleksandra Slavković Rebecca Carter Steorts |
author_sort | Bai Li |
collection | DOAJ |
description | Differential privacy has emerged as a popular model to provably limit privacy risks associated with a given data release. However releasing high dimensional synthetic data under differential privacy remains a challenging problem. In this paper, we study the problem of releasing synthetic data in the form of a high dimensional histogram under the constraint of differential privacy.
We develop an $(\epsilon, \delta)$-differentially private categorical data synthesizer called \emph{Stability Based Hashed Gibbs Sampler} (SBHG). SBHG works by combining a stability based sparse histogram estimation algorithm with Gibbs sampling and feature selection to approximate the empirical joint distribution of a discrete dataset. SBHG offers a competitive alternative to state-of-the art synthetic data generators while preserving the sparsity structure of the original dataset, which leads to improved statistical utility as illustrated on simulated data. Finally, to study the utility of the resulting synthetic data sets generated by SBHG, we also perform logistic regression using the synthetic datasets and compare the classification accuracy with those from using the original dataset. |
first_indexed | 2024-12-11T12:37:04Z |
format | Article |
id | doaj.art-8b1c31f5e1994d14a0119e8634fd7acb |
institution | Directory Open Access Journal |
issn | 2575-8527 |
language | English |
last_indexed | 2024-12-11T12:37:04Z |
publishDate | 2018-12-01 |
publisher | Labor Dynamics Institute |
record_format | Article |
series | The Journal of Privacy and Confidentiality |
spelling | doaj.art-8b1c31f5e1994d14a0119e8634fd7acb2022-12-22T01:07:06ZengLabor Dynamics InstituteThe Journal of Privacy and Confidentiality2575-85272018-12-018110.29012/jpc.657A Privacy Preserving Algorithm to Release Sparse High-dimensional HistogramsBai Li0Vishesh Karwa1Aleksandra Slavković2Rebecca Carter Steorts3Duke UniversityTemple UniversityPennsylvania State UniversityDuke UniversityDifferential privacy has emerged as a popular model to provably limit privacy risks associated with a given data release. However releasing high dimensional synthetic data under differential privacy remains a challenging problem. In this paper, we study the problem of releasing synthetic data in the form of a high dimensional histogram under the constraint of differential privacy. We develop an $(\epsilon, \delta)$-differentially private categorical data synthesizer called \emph{Stability Based Hashed Gibbs Sampler} (SBHG). SBHG works by combining a stability based sparse histogram estimation algorithm with Gibbs sampling and feature selection to approximate the empirical joint distribution of a discrete dataset. SBHG offers a competitive alternative to state-of-the art synthetic data generators while preserving the sparsity structure of the original dataset, which leads to improved statistical utility as illustrated on simulated data. Finally, to study the utility of the resulting synthetic data sets generated by SBHG, we also perform logistic regression using the synthetic datasets and compare the classification accuracy with those from using the original dataset.https://journalprivacyconfidentiality.org/index.php/jpc/article/view/657differential privacyhigh dimensional sparse histogramsstability based algorithmperturbed Gibbs samplerStability Based Hashed Gibbs Sampler |
spellingShingle | Bai Li Vishesh Karwa Aleksandra Slavković Rebecca Carter Steorts A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms The Journal of Privacy and Confidentiality differential privacy high dimensional sparse histograms stability based algorithm perturbed Gibbs sampler Stability Based Hashed Gibbs Sampler |
title | A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms |
title_full | A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms |
title_fullStr | A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms |
title_full_unstemmed | A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms |
title_short | A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms |
title_sort | privacy preserving algorithm to release sparse high dimensional histograms |
topic | differential privacy high dimensional sparse histograms stability based algorithm perturbed Gibbs sampler Stability Based Hashed Gibbs Sampler |
url | https://journalprivacyconfidentiality.org/index.php/jpc/article/view/657 |
work_keys_str_mv | AT baili aprivacypreservingalgorithmtoreleasesparsehighdimensionalhistograms AT visheshkarwa aprivacypreservingalgorithmtoreleasesparsehighdimensionalhistograms AT aleksandraslavkovic aprivacypreservingalgorithmtoreleasesparsehighdimensionalhistograms AT rebeccacartersteorts aprivacypreservingalgorithmtoreleasesparsehighdimensionalhistograms AT baili privacypreservingalgorithmtoreleasesparsehighdimensionalhistograms AT visheshkarwa privacypreservingalgorithmtoreleasesparsehighdimensionalhistograms AT aleksandraslavkovic privacypreservingalgorithmtoreleasesparsehighdimensionalhistograms AT rebeccacartersteorts privacypreservingalgorithmtoreleasesparsehighdimensionalhistograms |