A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples

We develop a simple method to reduce privacy loss when disclosing statistics such as OLS regression estimates based on samples with small numbers of observations. We focus on the case where the dataset can be broken into many groups (“cells”) and one is interested in releasing statistics for one or...

Full description

Bibliographic Details
Main Authors:	Raj Chetty, John N Friedman
Format:	Article
Language:	English
Published:	Labor Dynamics Institute 2019-10-01
Series:	The Journal of Privacy and Confidentiality
Subjects:	Differential Privacy Administrative Data
Online Access:	https://journalprivacyconfidentiality.org/index.php/jpc/article/view/716

_version_	1818566539071193088
author	Raj Chetty John N Friedman
author_facet	Raj Chetty John N Friedman
author_sort	Raj Chetty
collection	DOAJ
description	We develop a simple method to reduce privacy loss when disclosing statistics such as OLS regression estimates based on samples with small numbers of observations. We focus on the case where the dataset can be broken into many groups (“cells”) and one is interested in releasing statistics for one or more of these cells. Building on ideas from the differential privacy literature, we add noise to the statistic of interest in proportion to the statistic's maximum observed sensitivity, defined as the maximum change in the statistic from adding or removing a single observation across all the cells in the data. Intuitively, our approach permits the release of statistics in arbitrarily small samples by adding sufficient noise to the estimates to protect privacy. Although our method does not offer a formal privacy guarantee, it generally outperforms widely used methods of disclosure limitation such as count-based cell suppression both in terms of privacy loss and statistical bias. We illustrate how the method can be implemented by discussing how it was used to release estimates of social mobility by Census tract in the Opportunity Atlas. We also provide a step-by-step guide and illustrative Stata code to implement our approach.
first_indexed	2024-12-14T01:54:59Z
format	Article
id	doaj.art-a92f89ddfe134d8884bbedae096a8707
institution	Directory Open Access Journal
issn	2575-8527
language	English
last_indexed	2024-12-14T01:54:59Z
publishDate	2019-10-01
publisher	Labor Dynamics Institute
record_format	Article
series	The Journal of Privacy and Confidentiality
spelling	doaj.art-a92f89ddfe134d8884bbedae096a87072022-12-21T23:21:14ZengLabor Dynamics InstituteThe Journal of Privacy and Confidentiality2575-85272019-10-019210.29012/jpc.716A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small SamplesRaj Chetty0John N Friedman1Harvard University and NBERBrown University and NBERWe develop a simple method to reduce privacy loss when disclosing statistics such as OLS regression estimates based on samples with small numbers of observations. We focus on the case where the dataset can be broken into many groups (“cells”) and one is interested in releasing statistics for one or more of these cells. Building on ideas from the differential privacy literature, we add noise to the statistic of interest in proportion to the statistic's maximum observed sensitivity, defined as the maximum change in the statistic from adding or removing a single observation across all the cells in the data. Intuitively, our approach permits the release of statistics in arbitrarily small samples by adding sufficient noise to the estimates to protect privacy. Although our method does not offer a formal privacy guarantee, it generally outperforms widely used methods of disclosure limitation such as count-based cell suppression both in terms of privacy loss and statistical bias. We illustrate how the method can be implemented by discussing how it was used to release estimates of social mobility by Census tract in the Opportunity Atlas. We also provide a step-by-step guide and illustrative Stata code to implement our approach.https://journalprivacyconfidentiality.org/index.php/jpc/article/view/716Differential PrivacyAdministrative Data
spellingShingle	Raj Chetty John N Friedman A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples The Journal of Privacy and Confidentiality Differential Privacy Administrative Data
title	A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples
title_full	A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples
title_fullStr	A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples
title_full_unstemmed	A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples
title_short	A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples
title_sort	practical method to reduce privacy loss when disclosing statistics based on small samples
topic	Differential Privacy Administrative Data
url	https://journalprivacyconfidentiality.org/index.php/jpc/article/view/716
work_keys_str_mv	AT rajchetty apracticalmethodtoreduceprivacylosswhendisclosingstatisticsbasedonsmallsamples AT johnnfriedman apracticalmethodtoreduceprivacylosswhendisclosingstatisticsbasedonsmallsamples AT rajchetty practicalmethodtoreduceprivacylosswhendisclosingstatisticsbasedonsmallsamples AT johnnfriedman practicalmethodtoreduceprivacylosswhendisclosingstatisticsbasedonsmallsamples

A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples

Similar Items