Generating Differentially Private Synthetic Text

The advent of more powerful cloud compute over the past decade has made it possible to train the deep neural networks used today for applications in almost everything we do. However, the amount of existing data for private datasets, such as hospital records, remain scarce and will probably remain sc...

Full description

Bibliographic Details
Main Author:	Park, YeonHwan
Other Authors:	Kagal, Lalana
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/144503

_version_	1826212436204584960
author	Park, YeonHwan
author2	Kagal, Lalana
author_facet	Kagal, Lalana Park, YeonHwan
author_sort	Park, YeonHwan
collection	MIT
description	The advent of more powerful cloud compute over the past decade has made it possible to train the deep neural networks used today for applications in almost everything we do. However, the amount of existing data for private datasets, such as hospital records, remain scarce and will probably remain scarce for the foreseeable future. Without high-quality data, neural networks will not be able to perform high-quality inference. To aid in training models when existing information is limited, we aim to train existing deep neural network architectures to generate synthetic text that is similar to the text it was trained on without memorizing one-to-one mappings or leaking any sensitive data. To achieve this goal, we fine-tune our models to adhere to a strong notion differential privacy – a mathematical model bounding the extent to which an adversary can reconstruct the original dataset. In the desire to use the differentially private models to generate mixed-type tabular datasets with unstructured text, we also perform a survey to gain a better understanding of how our algorithm might be used to supplement existing neural networks.
first_indexed	2024-09-23T15:21:18Z
format	Thesis
id	mit-1721.1/144503
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T15:21:18Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1445032022-08-30T03:10:07Z Generating Differentially Private Synthetic Text Park, YeonHwan Kagal, Lalana Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science The advent of more powerful cloud compute over the past decade has made it possible to train the deep neural networks used today for applications in almost everything we do. However, the amount of existing data for private datasets, such as hospital records, remain scarce and will probably remain scarce for the foreseeable future. Without high-quality data, neural networks will not be able to perform high-quality inference. To aid in training models when existing information is limited, we aim to train existing deep neural network architectures to generate synthetic text that is similar to the text it was trained on without memorizing one-to-one mappings or leaking any sensitive data. To achieve this goal, we fine-tune our models to adhere to a strong notion differential privacy – a mathematical model bounding the extent to which an adversary can reconstruct the original dataset. In the desire to use the differentially private models to generate mixed-type tabular datasets with unstructured text, we also perform a survey to gain a better understanding of how our algorithm might be used to supplement existing neural networks. M.Eng. 2022-08-29T15:51:56Z 2022-08-29T15:51:56Z 2022-05 2022-05-27T16:18:21.479Z Thesis https://hdl.handle.net/1721.1/144503 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Park, YeonHwan Generating Differentially Private Synthetic Text
title	Generating Differentially Private Synthetic Text
title_full	Generating Differentially Private Synthetic Text
title_fullStr	Generating Differentially Private Synthetic Text
title_full_unstemmed	Generating Differentially Private Synthetic Text
title_short	Generating Differentially Private Synthetic Text
title_sort	generating differentially private synthetic text
url	https://hdl.handle.net/1721.1/144503
work_keys_str_mv	AT parkyeonhwan generatingdifferentiallyprivatesynthetictext

Generating Differentially Private Synthetic Text

Similar Items