Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals w...

Full description

Bibliographic Details
Main Authors: Ryan McKenna, Gerome Miklau, Daniel Sheldon
Format: Article
Language:English
Published: Labor Dynamics Institute 2021-12-01
Series:The Journal of Privacy and Confidentiality
Subjects:
Online Access:http://journalprivacyconfidentiality.org/index.php/jpc/article/view/778
_version_ 1818775396312678400
author Ryan McKenna
Gerome Miklau
Daniel Sheldon
author_facet Ryan McKenna
Gerome Miklau
Daniel Sheldon
author_sort Ryan McKenna
collection DOAJ
description We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation.
first_indexed 2024-12-18T10:56:22Z
format Article
id doaj.art-dde4f43d674242f98e65be5d0ed38452
institution Directory Open Access Journal
issn 2575-8527
language English
last_indexed 2024-12-18T10:56:22Z
publishDate 2021-12-01
publisher Labor Dynamics Institute
record_format Article
series The Journal of Privacy and Confidentiality
spelling doaj.art-dde4f43d674242f98e65be5d0ed384522022-12-21T21:10:20ZengLabor Dynamics InstituteThe Journal of Privacy and Confidentiality2575-85272021-12-0111310.29012/jpc.778Winning the NIST Contest: A scalable and general approach to differentially private synthetic dataRyan McKenna0Gerome Miklau1Daniel Sheldon2University of Massachusetts, AmherstUniversity of Massachusetts, AmherstUniversity of Massachusetts, AmherstWe propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation.http://journalprivacyconfidentiality.org/index.php/jpc/article/view/778differential privacysynthetic datagraphical models
spellingShingle Ryan McKenna
Gerome Miklau
Daniel Sheldon
Winning the NIST Contest: A scalable and general approach to differentially private synthetic data
The Journal of Privacy and Confidentiality
differential privacy
synthetic data
graphical models
title Winning the NIST Contest: A scalable and general approach to differentially private synthetic data
title_full Winning the NIST Contest: A scalable and general approach to differentially private synthetic data
title_fullStr Winning the NIST Contest: A scalable and general approach to differentially private synthetic data
title_full_unstemmed Winning the NIST Contest: A scalable and general approach to differentially private synthetic data
title_short Winning the NIST Contest: A scalable and general approach to differentially private synthetic data
title_sort winning the nist contest a scalable and general approach to differentially private synthetic data
topic differential privacy
synthetic data
graphical models
url http://journalprivacyconfidentiality.org/index.php/jpc/article/view/778
work_keys_str_mv AT ryanmckenna winningthenistcontestascalableandgeneralapproachtodifferentiallyprivatesyntheticdata
AT geromemiklau winningthenistcontestascalableandgeneralapproachtodifferentiallyprivatesyntheticdata
AT danielsheldon winningthenistcontestascalableandgeneralapproachtodifferentiallyprivatesyntheticdata