Random Sequential Encoders for Private Data Release in NLP

There are many scenarios that motivate data owners to outsource the training of machine learning models on their data to external model developers. While doing so, it is of data owners’ best interests to keep their data private - meaning that no third party, including the model developer, can learn...

Full description

Bibliographic Details
Main Author:	Jaba, Andrea
Other Authors:	Medard, Muriel
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/144874

_version_	1826202536347959296
author	Jaba, Andrea
author2	Medard, Muriel
author_facet	Medard, Muriel Jaba, Andrea
author_sort	Jaba, Andrea
collection	MIT
description	There are many scenarios that motivate data owners to outsource the training of machine learning models on their data to external model developers. While doing so, it is of data owners’ best interests to keep their data private - meaning that no third party, including the model developer, can learn anything more about their data than the labels associated with the machine learning task, which is difficult to guarantee while maintaining the model utility of said task. In computer vision, lightweight random convolutional networks have shown potential to be an encoder that balances privacy and utility. This thesis takes a novel exploration of random sequential encoders - (1) random recurrent neural networks and (2) random long short-term memory networks as encoding schemes for private data release in natural language processing. Experiments were conducted to evaluate the utility and privacy of these encoders against known baseline encoding schemes with less privacy: (1) not using an encoder and (2) random linear encoder. For the private release of a spam classification dataset, the usage of random long short-term memory networks as encoders maintained the most utility among all random encoders, while being relatively robust to the privacy attacks this thesis considers, and signals a promising direction for future experiments.
first_indexed	2024-09-23T12:08:46Z
format	Thesis
id	mit-1721.1/144874
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T12:08:46Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1448742022-08-30T03:36:11Z Random Sequential Encoders for Private Data Release in NLP Jaba, Andrea Medard, Muriel Esfahanizadeh, Homa Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science There are many scenarios that motivate data owners to outsource the training of machine learning models on their data to external model developers. While doing so, it is of data owners’ best interests to keep their data private - meaning that no third party, including the model developer, can learn anything more about their data than the labels associated with the machine learning task, which is difficult to guarantee while maintaining the model utility of said task. In computer vision, lightweight random convolutional networks have shown potential to be an encoder that balances privacy and utility. This thesis takes a novel exploration of random sequential encoders - (1) random recurrent neural networks and (2) random long short-term memory networks as encoding schemes for private data release in natural language processing. Experiments were conducted to evaluate the utility and privacy of these encoders against known baseline encoding schemes with less privacy: (1) not using an encoder and (2) random linear encoder. For the private release of a spam classification dataset, the usage of random long short-term memory networks as encoders maintained the most utility among all random encoders, while being relatively robust to the privacy attacks this thesis considers, and signals a promising direction for future experiments. M.Eng. 2022-08-29T16:17:47Z 2022-08-29T16:17:47Z 2022-05 2022-05-27T16:18:40.970Z Thesis https://hdl.handle.net/1721.1/144874 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Jaba, Andrea Random Sequential Encoders for Private Data Release in NLP
title	Random Sequential Encoders for Private Data Release in NLP
title_full	Random Sequential Encoders for Private Data Release in NLP
title_fullStr	Random Sequential Encoders for Private Data Release in NLP
title_full_unstemmed	Random Sequential Encoders for Private Data Release in NLP
title_short	Random Sequential Encoders for Private Data Release in NLP
title_sort	random sequential encoders for private data release in nlp
url	https://hdl.handle.net/1721.1/144874
work_keys_str_mv	AT jabaandrea randomsequentialencodersforprivatedatareleaseinnlp

Random Sequential Encoders for Private Data Release in NLP

Similar Items