Calibrating Noise to Sensitivity in Private Data Analysis

We continue a line of research initiated in Dinur and Nissim (2003); Dwork and Nissim (2004); and Blum et al. (2005) on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function $f$ mapping databases to reals, the so...

Full description

Bibliographic Details
Main Authors: Cynthia Dwork, Frank McSherry, Kobbi Nissim, Adam Smith
Format: Article
Language:English
Published: Labor Dynamics Institute 2017-05-01
Series:The Journal of Privacy and Confidentiality
Subjects:
Online Access:https://journalprivacyconfidentiality.org/index.php/jpc/article/view/405
_version_ 1818021345967996928
author Cynthia Dwork
Frank McSherry
Kobbi Nissim
Adam Smith
author_facet Cynthia Dwork
Frank McSherry
Kobbi Nissim
Adam Smith
author_sort Cynthia Dwork
collection DOAJ
description We continue a line of research initiated in Dinur and Nissim (2003); Dwork and Nissim (2004); and Blum et al. (2005) on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function $f$ mapping databases to reals, the so-called {\em true answer} is the result of applying $f$ to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which $f = \sum_i g(x_i)$, where $x_i$ denotes the $i$th row of the database and $g$ maps database rows to $[0,1]$. We extend the study to general functions $f$, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the {\em sensitivity} of the function $f$. Roughly speaking, this is the amount that any single argument to $f$ can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean definition of privacy---now known as differential privacy---and measure of its loss. We also provide a set of tools for designing and combining differentially private algorithms, permitting the construction of complex differentially private analytical tools from simple differentially private primitives. Finally, we obtain separation results showing the increased value of interactive statistical release mechanisms over non-interactive ones.
first_indexed 2024-04-14T08:17:35Z
format Article
id doaj.art-b011dabf17ba4cde93da5b4b1f731334
institution Directory Open Access Journal
issn 2575-8527
language English
last_indexed 2024-04-14T08:17:35Z
publishDate 2017-05-01
publisher Labor Dynamics Institute
record_format Article
series The Journal of Privacy and Confidentiality
spelling doaj.art-b011dabf17ba4cde93da5b4b1f7313342022-12-22T02:04:21ZengLabor Dynamics InstituteThe Journal of Privacy and Confidentiality2575-85272017-05-017310.29012/jpc.v7i3.405Calibrating Noise to Sensitivity in Private Data AnalysisCynthia Dwork0Frank McSherry1Kobbi Nissim2Adam Smith3Harvard UniversityMicrosoft Research SVCGeorgetown University; This work was done while the author was at the Department of Computer Science, Ben-Gurion UniversityPenn State UniversityWe continue a line of research initiated in Dinur and Nissim (2003); Dwork and Nissim (2004); and Blum et al. (2005) on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function $f$ mapping databases to reals, the so-called {\em true answer} is the result of applying $f$ to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which $f = \sum_i g(x_i)$, where $x_i$ denotes the $i$th row of the database and $g$ maps database rows to $[0,1]$. We extend the study to general functions $f$, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the {\em sensitivity} of the function $f$. Roughly speaking, this is the amount that any single argument to $f$ can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean definition of privacy---now known as differential privacy---and measure of its loss. We also provide a set of tools for designing and combining differentially private algorithms, permitting the construction of complex differentially private analytical tools from simple differentially private primitives. Finally, we obtain separation results showing the increased value of interactive statistical release mechanisms over non-interactive ones.https://journalprivacyconfidentiality.org/index.php/jpc/article/view/405private data analysisstatistical data privacydifferential privacynoise addition
spellingShingle Cynthia Dwork
Frank McSherry
Kobbi Nissim
Adam Smith
Calibrating Noise to Sensitivity in Private Data Analysis
The Journal of Privacy and Confidentiality
private data analysis
statistical data privacy
differential privacy
noise addition
title Calibrating Noise to Sensitivity in Private Data Analysis
title_full Calibrating Noise to Sensitivity in Private Data Analysis
title_fullStr Calibrating Noise to Sensitivity in Private Data Analysis
title_full_unstemmed Calibrating Noise to Sensitivity in Private Data Analysis
title_short Calibrating Noise to Sensitivity in Private Data Analysis
title_sort calibrating noise to sensitivity in private data analysis
topic private data analysis
statistical data privacy
differential privacy
noise addition
url https://journalprivacyconfidentiality.org/index.php/jpc/article/view/405
work_keys_str_mv AT cynthiadwork calibratingnoisetosensitivityinprivatedataanalysis
AT frankmcsherry calibratingnoisetosensitivityinprivatedataanalysis
AT kobbinissim calibratingnoisetosensitivityinprivatedataanalysis
AT adamsmith calibratingnoisetosensitivityinprivatedataanalysis