Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets

When releasing individual-level data to the public, statistical agencies typically alter data values to protect the confidentiality of individuals’ identities and sensitive attributes. When data undergo substantial perturbation, secondary data analysts’ inferences can be distorted in ways that they...

Full description

Bibliographic Details
Main Authors:	David R. McClure, Jerome P. Reiter
Format:	Article
Language:	English
Published:	Labor Dynamics Institute 2012-07-01
Series:	The Journal of Privacy and Confidentiality
Subjects:	Confidentiality Disclosure Multiple imputation Utility Verification
Online Access:	https://journalprivacyconfidentiality.org/index.php/jpc/article/view/616

_version_	1818566491100938240
author	David R. McClure Jerome P. Reiter
author_facet	David R. McClure Jerome P. Reiter
author_sort	David R. McClure
collection	DOAJ
description	When releasing individual-level data to the public, statistical agencies typically alter data values to protect the confidentiality of individuals’ identities and sensitive attributes. When data undergo substantial perturbation, secondary data analysts’ inferences can be distorted in ways that they typically cannot determine from the released data alone. This is problematic, in that analysts have no idea if they should trust the results based on the altered data.To ameliorate this problem, agencies can establish verification servers, which are remote computers that analysts query for measures of the quality of inferences obtained from disclosure-protected data. The reported quality measures reflect the similarity between the analysis done with the altered data and the analysis done with the confidential data. However, quality measures can leak information about the confidential values, so that they too must be subject to disclosure protections. In this article, we discuss several approaches to releasing quality measures for verification servers when the public use data are generated via multiple imputation, also known as synthetic data. The methods can be modified for other stochastic perturbation methods.
first_indexed	2024-12-14T01:54:21Z
format	Article
id	doaj.art-cc0c22ae342c4364844fe42c367b670e
institution	Directory Open Access Journal
issn	2575-8527
language	English
last_indexed	2024-12-14T01:54:21Z
publishDate	2012-07-01
publisher	Labor Dynamics Institute
record_format	Article
series	The Journal of Privacy and Confidentiality
spelling	doaj.art-cc0c22ae342c4364844fe42c367b670e2022-12-21T23:21:16ZengLabor Dynamics InstituteThe Journal of Privacy and Confidentiality2575-85272012-07-014110.29012/jpc.v4i1.616Towards Providing Automated Feedback on the Quality of Inferences from Synthetic DatasetsDavid R. McClure0Jerome P. Reiter1Department of Statistical Science, Duke University, Durham, NCDepartment of Statistical Science, Duke University, Durham, NCWhen releasing individual-level data to the public, statistical agencies typically alter data values to protect the confidentiality of individuals’ identities and sensitive attributes. When data undergo substantial perturbation, secondary data analysts’ inferences can be distorted in ways that they typically cannot determine from the released data alone. This is problematic, in that analysts have no idea if they should trust the results based on the altered data.To ameliorate this problem, agencies can establish verification servers, which are remote computers that analysts query for measures of the quality of inferences obtained from disclosure-protected data. The reported quality measures reflect the similarity between the analysis done with the altered data and the analysis done with the confidential data. However, quality measures can leak information about the confidential values, so that they too must be subject to disclosure protections. In this article, we discuss several approaches to releasing quality measures for verification servers when the public use data are generated via multiple imputation, also known as synthetic data. The methods can be modified for other stochastic perturbation methods.https://journalprivacyconfidentiality.org/index.php/jpc/article/view/616ConfidentialityDisclosureMultiple imputationUtilityVerification
spellingShingle	David R. McClure Jerome P. Reiter Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets The Journal of Privacy and Confidentiality Confidentiality Disclosure Multiple imputation Utility Verification
title	Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets
title_full	Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets
title_fullStr	Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets
title_full_unstemmed	Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets
title_short	Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets
title_sort	towards providing automated feedback on the quality of inferences from synthetic datasets
topic	Confidentiality Disclosure Multiple imputation Utility Verification
url	https://journalprivacyconfidentiality.org/index.php/jpc/article/view/616
work_keys_str_mv	AT davidrmcclure towardsprovidingautomatedfeedbackonthequalityofinferencesfromsyntheticdatasets AT jeromepreiter towardsprovidingautomatedfeedbackonthequalityofinferencesfromsyntheticdatasets

Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets

Similar Items