Generating Differentially Private Synthetic Text
The advent of more powerful cloud compute over the past decade has made it possible to train the deep neural networks used today for applications in almost everything we do. However, the amount of existing data for private datasets, such as hospital records, remain scarce and will probably remain sc...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/144503 |
_version_ | 1826212436204584960 |
---|---|
author | Park, YeonHwan |
author2 | Kagal, Lalana |
author_facet | Kagal, Lalana Park, YeonHwan |
author_sort | Park, YeonHwan |
collection | MIT |
description | The advent of more powerful cloud compute over the past decade has made it possible to train the deep neural networks used today for applications in almost everything we do. However, the amount of existing data for private datasets, such as hospital records, remain scarce and will probably remain scarce for the foreseeable future. Without high-quality data, neural networks will not be able to perform high-quality inference.
To aid in training models when existing information is limited, we aim to train existing deep neural network architectures to generate synthetic text that is similar to the text it was trained on without memorizing one-to-one mappings or leaking any sensitive data. To achieve this goal, we fine-tune our models to adhere to a strong notion differential privacy – a mathematical model bounding the extent to which an adversary can reconstruct the original dataset.
In the desire to use the differentially private models to generate mixed-type tabular datasets with unstructured text, we also perform a survey to gain a better understanding of how our algorithm might be used to supplement existing neural networks. |
first_indexed | 2024-09-23T15:21:18Z |
format | Thesis |
id | mit-1721.1/144503 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T15:21:18Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1445032022-08-30T03:10:07Z Generating Differentially Private Synthetic Text Park, YeonHwan Kagal, Lalana Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science The advent of more powerful cloud compute over the past decade has made it possible to train the deep neural networks used today for applications in almost everything we do. However, the amount of existing data for private datasets, such as hospital records, remain scarce and will probably remain scarce for the foreseeable future. Without high-quality data, neural networks will not be able to perform high-quality inference. To aid in training models when existing information is limited, we aim to train existing deep neural network architectures to generate synthetic text that is similar to the text it was trained on without memorizing one-to-one mappings or leaking any sensitive data. To achieve this goal, we fine-tune our models to adhere to a strong notion differential privacy – a mathematical model bounding the extent to which an adversary can reconstruct the original dataset. In the desire to use the differentially private models to generate mixed-type tabular datasets with unstructured text, we also perform a survey to gain a better understanding of how our algorithm might be used to supplement existing neural networks. M.Eng. 2022-08-29T15:51:56Z 2022-08-29T15:51:56Z 2022-05 2022-05-27T16:18:21.479Z Thesis https://hdl.handle.net/1721.1/144503 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Park, YeonHwan Generating Differentially Private Synthetic Text |
title | Generating Differentially Private Synthetic Text |
title_full | Generating Differentially Private Synthetic Text |
title_fullStr | Generating Differentially Private Synthetic Text |
title_full_unstemmed | Generating Differentially Private Synthetic Text |
title_short | Generating Differentially Private Synthetic Text |
title_sort | generating differentially private synthetic text |
url | https://hdl.handle.net/1721.1/144503 |
work_keys_str_mv | AT parkyeonhwan generatingdifferentiallyprivatesynthetictext |