Neural Data Shaping and Evaluation via Mutual Information Estimation

Machine learning in sensitive domains like healthcare currently faces a major bottleneck due to the scarcity of data that is publicly available. Privacy protection regulations such as HIPAA and GDPR and recent progress in information estimation literature motivate us to investigate the issue from an...

Full description

Bibliographic Details
Main Author:	Wu, William
Other Authors:	Médard, Muriel
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/147511

_version_	1811093846903226368
author	Wu, William
author2	Médard, Muriel
author_facet	Médard, Muriel Wu, William
author_sort	Wu, William
collection	MIT
description	Machine learning in sensitive domains like healthcare currently faces a major bottleneck due to the scarcity of data that is publicly available. Privacy protection regulations such as HIPAA and GDPR and recent progress in information estimation literature motivate us to investigate the issue from an information theoretic perspective. In this thesis, we propose InfoShape, an encoder training scheme that aims to maintain privacy while also preserving utility for downstream prediction tasks. We achieve this by utilizing mutual information neural estimation (MINE) [2] to estimate two quantities, privacy leakage: the mutual information between the original inputs and the encoded representations, and utility score: the mutual information between the encoded representations and the intended labeling information for classification. We train a neural network as our encoder by using our privacy and utility measures in a Lagrangian optimization. We show empirically on Gaussian generated data that InfoShape is capable of altering encoded sample outputs such that the privacy leakage is reduced and the utility score increases. Moreover, we observe that the classification accuracy of downstream models has a meaningful connection with the utility score, which improves after we train an encoder compared to the untrained encoder. This work has profound implications for privacy-preserving machine learning and could serve as a pivotal tool in the future for revolutionizing AI in areas like healthcare.
first_indexed	2024-09-23T15:51:37Z
format	Thesis
id	mit-1721.1/147511
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T15:51:37Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1475112023-01-20T03:06:49Z Neural Data Shaping and Evaluation via Mutual Information Estimation Wu, William Médard, Muriel Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Machine learning in sensitive domains like healthcare currently faces a major bottleneck due to the scarcity of data that is publicly available. Privacy protection regulations such as HIPAA and GDPR and recent progress in information estimation literature motivate us to investigate the issue from an information theoretic perspective. In this thesis, we propose InfoShape, an encoder training scheme that aims to maintain privacy while also preserving utility for downstream prediction tasks. We achieve this by utilizing mutual information neural estimation (MINE) [2] to estimate two quantities, privacy leakage: the mutual information between the original inputs and the encoded representations, and utility score: the mutual information between the encoded representations and the intended labeling information for classification. We train a neural network as our encoder by using our privacy and utility measures in a Lagrangian optimization. We show empirically on Gaussian generated data that InfoShape is capable of altering encoded sample outputs such that the privacy leakage is reduced and the utility score increases. Moreover, we observe that the classification accuracy of downstream models has a meaningful connection with the utility score, which improves after we train an encoder compared to the untrained encoder. This work has profound implications for privacy-preserving machine learning and could serve as a pivotal tool in the future for revolutionizing AI in areas like healthcare. M.Eng. 2023-01-19T19:55:11Z 2023-01-19T19:55:11Z 2022-09 2022-09-16T20:24:31.763Z Thesis https://hdl.handle.net/1721.1/147511 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Wu, William Neural Data Shaping and Evaluation via Mutual Information Estimation
title	Neural Data Shaping and Evaluation via Mutual Information Estimation
title_full	Neural Data Shaping and Evaluation via Mutual Information Estimation
title_fullStr	Neural Data Shaping and Evaluation via Mutual Information Estimation
title_full_unstemmed	Neural Data Shaping and Evaluation via Mutual Information Estimation
title_short	Neural Data Shaping and Evaluation via Mutual Information Estimation
title_sort	neural data shaping and evaluation via mutual information estimation
url	https://hdl.handle.net/1721.1/147511
work_keys_str_mv	AT wuwilliam neuraldatashapingandevaluationviamutualinformationestimation

Neural Data Shaping and Evaluation via Mutual Information Estimation

Similar Items