Privacy preserving data visualizations

Abstract Data visualizations are a valuable tool used during both statistical analysis and the interpretation of results as they graphically reveal useful information about the structure, properties and relationships between variables, which may otherwise be concealed in tabulated dat...

Full description

Bibliographic Details
Main Authors: Avraam, Demetris, Wilson, Rebecca, Butters, Oliver, Burton, Thomas, Nicolaides, Christos, Jones, Elinor, Boyd, Andy, Burton, Paul
Other Authors: Sloan School of Management
Format: Article
Language:English
Published: Springer Berlin Heidelberg 2021
Online Access:https://hdl.handle.net/1721.1/131977
_version_ 1811086761596551168
author Avraam, Demetris
Wilson, Rebecca
Butters, Oliver
Burton, Thomas
Nicolaides, Christos
Jones, Elinor
Boyd, Andy
Burton, Paul
author2 Sloan School of Management
author_facet Sloan School of Management
Avraam, Demetris
Wilson, Rebecca
Butters, Oliver
Burton, Thomas
Nicolaides, Christos
Jones, Elinor
Boyd, Andy
Burton, Paul
author_sort Avraam, Demetris
collection MIT
description Abstract Data visualizations are a valuable tool used during both statistical analysis and the interpretation of results as they graphically reveal useful information about the structure, properties and relationships between variables, which may otherwise be concealed in tabulated data. In disciplines like medicine and the social sciences, where collected data include sensitive information about study participants, the sharing and publication of individual-level records is controlled by data protection laws and ethico-legal norms. Thus, as data visualizations – such as graphs and plots – may be linked to other released information and used to identify study participants and their personal attributes, their creation is often prohibited by the terms of data use. These restrictions are enforced to reduce the risk of breaching data subject confidentiality, however they limit analysts from displaying useful descriptive plots for their research features and findings. Here we propose the use of anonymization techniques to generate privacy-preserving visualizations that retain the statistical properties of the underlying data while still adhering to strict data disclosure rules. We demonstrate the use of (i) the well-known k-anonymization process which preserves privacy by reducing the granularity of the data using suppression and generalization, (ii) a novel deterministic approach that replaces individual-level observations with the centroids of each k nearest neighbours, and (iii) a probabilistic procedure that perturbs individual attributes with the addition of random stochastic noise. We apply the proposed methods to generate privacy-preserving data visualizations for exploratory data analysis and inferential regression plot diagnostics, and we discuss their strengths and limitations.
first_indexed 2024-09-23T13:34:19Z
format Article
id mit-1721.1/131977
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T13:34:19Z
publishDate 2021
publisher Springer Berlin Heidelberg
record_format dspace
spelling mit-1721.1/1319772023-09-15T18:10:02Z Privacy preserving data visualizations Avraam, Demetris Wilson, Rebecca Butters, Oliver Burton, Thomas Nicolaides, Christos Jones, Elinor Boyd, Andy Burton, Paul Sloan School of Management Abstract Data visualizations are a valuable tool used during both statistical analysis and the interpretation of results as they graphically reveal useful information about the structure, properties and relationships between variables, which may otherwise be concealed in tabulated data. In disciplines like medicine and the social sciences, where collected data include sensitive information about study participants, the sharing and publication of individual-level records is controlled by data protection laws and ethico-legal norms. Thus, as data visualizations – such as graphs and plots – may be linked to other released information and used to identify study participants and their personal attributes, their creation is often prohibited by the terms of data use. These restrictions are enforced to reduce the risk of breaching data subject confidentiality, however they limit analysts from displaying useful descriptive plots for their research features and findings. Here we propose the use of anonymization techniques to generate privacy-preserving visualizations that retain the statistical properties of the underlying data while still adhering to strict data disclosure rules. We demonstrate the use of (i) the well-known k-anonymization process which preserves privacy by reducing the granularity of the data using suppression and generalization, (ii) a novel deterministic approach that replaces individual-level observations with the centroids of each k nearest neighbours, and (iii) a probabilistic procedure that perturbs individual attributes with the addition of random stochastic noise. We apply the proposed methods to generate privacy-preserving data visualizations for exploratory data analysis and inferential regression plot diagnostics, and we discuss their strengths and limitations. 2021-09-20T17:41:13Z 2021-09-20T17:41:13Z 2021-01-07 2021-01-10T04:14:46Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/131977 EPJ Data Science. 2021 Jan 07;10(1):2 PUBLISHER_CC en https://doi.org/10.1140/epjds/s13688-020-00257-4 Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The Author(s) application/pdf Springer Berlin Heidelberg Springer Berlin Heidelberg
spellingShingle Avraam, Demetris
Wilson, Rebecca
Butters, Oliver
Burton, Thomas
Nicolaides, Christos
Jones, Elinor
Boyd, Andy
Burton, Paul
Privacy preserving data visualizations
title Privacy preserving data visualizations
title_full Privacy preserving data visualizations
title_fullStr Privacy preserving data visualizations
title_full_unstemmed Privacy preserving data visualizations
title_short Privacy preserving data visualizations
title_sort privacy preserving data visualizations
url https://hdl.handle.net/1721.1/131977
work_keys_str_mv AT avraamdemetris privacypreservingdatavisualizations
AT wilsonrebecca privacypreservingdatavisualizations
AT buttersoliver privacypreservingdatavisualizations
AT burtonthomas privacypreservingdatavisualizations
AT nicolaideschristos privacypreservingdatavisualizations
AT joneselinor privacypreservingdatavisualizations
AT boydandy privacypreservingdatavisualizations
AT burtonpaul privacypreservingdatavisualizations