Winning the NIST Contest: A scalable and general approach to differentially private synthetic data
We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals w...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Labor Dynamics Institute
2021-12-01
|
Series: | The Journal of Privacy and Confidentiality |
Subjects: | |
Online Access: | http://journalprivacyconfidentiality.org/index.php/jpc/article/view/778 |
_version_ | 1818775396312678400 |
---|---|
author | Ryan McKenna Gerome Miklau Daniel Sheldon |
author_facet | Ryan McKenna Gerome Miklau Daniel Sheldon |
author_sort | Ryan McKenna |
collection | DOAJ |
description | We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation. |
first_indexed | 2024-12-18T10:56:22Z |
format | Article |
id | doaj.art-dde4f43d674242f98e65be5d0ed38452 |
institution | Directory Open Access Journal |
issn | 2575-8527 |
language | English |
last_indexed | 2024-12-18T10:56:22Z |
publishDate | 2021-12-01 |
publisher | Labor Dynamics Institute |
record_format | Article |
series | The Journal of Privacy and Confidentiality |
spelling | doaj.art-dde4f43d674242f98e65be5d0ed384522022-12-21T21:10:20ZengLabor Dynamics InstituteThe Journal of Privacy and Confidentiality2575-85272021-12-0111310.29012/jpc.778Winning the NIST Contest: A scalable and general approach to differentially private synthetic dataRyan McKenna0Gerome Miklau1Daniel Sheldon2University of Massachusetts, AmherstUniversity of Massachusetts, AmherstUniversity of Massachusetts, AmherstWe propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation.http://journalprivacyconfidentiality.org/index.php/jpc/article/view/778differential privacysynthetic datagraphical models |
spellingShingle | Ryan McKenna Gerome Miklau Daniel Sheldon Winning the NIST Contest: A scalable and general approach to differentially private synthetic data The Journal of Privacy and Confidentiality differential privacy synthetic data graphical models |
title | Winning the NIST Contest: A scalable and general approach to differentially private synthetic data |
title_full | Winning the NIST Contest: A scalable and general approach to differentially private synthetic data |
title_fullStr | Winning the NIST Contest: A scalable and general approach to differentially private synthetic data |
title_full_unstemmed | Winning the NIST Contest: A scalable and general approach to differentially private synthetic data |
title_short | Winning the NIST Contest: A scalable and general approach to differentially private synthetic data |
title_sort | winning the nist contest a scalable and general approach to differentially private synthetic data |
topic | differential privacy synthetic data graphical models |
url | http://journalprivacyconfidentiality.org/index.php/jpc/article/view/778 |
work_keys_str_mv | AT ryanmckenna winningthenistcontestascalableandgeneralapproachtodifferentiallyprivatesyntheticdata AT geromemiklau winningthenistcontestascalableandgeneralapproachtodifferentiallyprivatesyntheticdata AT danielsheldon winningthenistcontestascalableandgeneralapproachtodifferentiallyprivatesyntheticdata |