Releasing survey microdata with exact cluster locations and additional privacy safeguards

Abstract Household survey programs around the world publish fine-granular georeferenced microdata to support research on the interdependence of human livelihoods and their surrounding environment. To safeguard the respondents’ privacy, micro-level survey data is usually (pseudo)-anonymized through d...

Full description

Bibliographic Details
Main Authors: Till Koebe, Alejandra Arias-Salazar, Timo Schmid
Format: Article
Language:English
Published: Springer Nature 2023-05-01
Series:Humanities & Social Sciences Communications
Online Access:https://doi.org/10.1057/s41599-023-01694-y
_version_ 1827948206083801088
author Till Koebe
Alejandra Arias-Salazar
Timo Schmid
author_facet Till Koebe
Alejandra Arias-Salazar
Timo Schmid
author_sort Till Koebe
collection DOAJ
description Abstract Household survey programs around the world publish fine-granular georeferenced microdata to support research on the interdependence of human livelihoods and their surrounding environment. To safeguard the respondents’ privacy, micro-level survey data is usually (pseudo)-anonymized through deletion or perturbation procedures such as obfuscating the true location of data collection. This, however, poses a challenge to emerging approaches that augment survey data with auxiliary information on a local level. Here, we propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards through synthetically generated data using generative models. We back our proposal with experiments using data from the 2011 Costa Rican census and satellite-derived auxiliary information. Our strategy reduces the respondents’ re-identification risk for any number of disclosed attributes by 60–80% even under re-identification attempts.
first_indexed 2024-04-09T12:51:49Z
format Article
id doaj.art-922f47af737f475c8f6ffb53d123b0d2
institution Directory Open Access Journal
issn 2662-9992
language English
last_indexed 2024-04-09T12:51:49Z
publishDate 2023-05-01
publisher Springer Nature
record_format Article
series Humanities & Social Sciences Communications
spelling doaj.art-922f47af737f475c8f6ffb53d123b0d22023-05-14T11:11:24ZengSpringer NatureHumanities & Social Sciences Communications2662-99922023-05-0110111310.1057/s41599-023-01694-yReleasing survey microdata with exact cluster locations and additional privacy safeguardsTill Koebe0Alejandra Arias-Salazar1Timo Schmid2Saarland Informatics Campus, Saarland UniversitySchool of Statistics, University of Costa RicaInstitute of Statistics, Otto Friedrich University BambergAbstract Household survey programs around the world publish fine-granular georeferenced microdata to support research on the interdependence of human livelihoods and their surrounding environment. To safeguard the respondents’ privacy, micro-level survey data is usually (pseudo)-anonymized through deletion or perturbation procedures such as obfuscating the true location of data collection. This, however, poses a challenge to emerging approaches that augment survey data with auxiliary information on a local level. Here, we propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards through synthetically generated data using generative models. We back our proposal with experiments using data from the 2011 Costa Rican census and satellite-derived auxiliary information. Our strategy reduces the respondents’ re-identification risk for any number of disclosed attributes by 60–80% even under re-identification attempts.https://doi.org/10.1057/s41599-023-01694-y
spellingShingle Till Koebe
Alejandra Arias-Salazar
Timo Schmid
Releasing survey microdata with exact cluster locations and additional privacy safeguards
Humanities & Social Sciences Communications
title Releasing survey microdata with exact cluster locations and additional privacy safeguards
title_full Releasing survey microdata with exact cluster locations and additional privacy safeguards
title_fullStr Releasing survey microdata with exact cluster locations and additional privacy safeguards
title_full_unstemmed Releasing survey microdata with exact cluster locations and additional privacy safeguards
title_short Releasing survey microdata with exact cluster locations and additional privacy safeguards
title_sort releasing survey microdata with exact cluster locations and additional privacy safeguards
url https://doi.org/10.1057/s41599-023-01694-y
work_keys_str_mv AT tillkoebe releasingsurveymicrodatawithexactclusterlocationsandadditionalprivacysafeguards
AT alejandraariassalazar releasingsurveymicrodatawithexactclusterlocationsandadditionalprivacysafeguards
AT timoschmid releasingsurveymicrodatawithexactclusterlocationsandadditionalprivacysafeguards