Generative Modeling with Guarantees
Language models have become ubiquitous in natural language processing, leveraging large amounts of unlabeled data and fine-tuning for downstream tasks. However, concerns have been raised regarding the accuracy and trustworthiness of the text generated by these models. In parallel, differential priva...
Glavni avtor: | |
---|---|
Drugi avtorji: | |
Format: | Thesis |
Izdano: |
Massachusetts Institute of Technology
2023
|
Online dostop: | https://hdl.handle.net/1721.1/151388 |
_version_ | 1826204528028942336 |
---|---|
author | Quach, Victor |
author2 | Barzilay, Regina |
author_facet | Barzilay, Regina Quach, Victor |
author_sort | Quach, Victor |
collection | MIT |
description | Language models have become ubiquitous in natural language processing, leveraging large amounts of unlabeled data and fine-tuning for downstream tasks. However, concerns have been raised regarding the accuracy and trustworthiness of the text generated by these models. In parallel, differential privacy has emerged as a framework to protect sensitive information while allowing machine learning algorithms to learn from it. Nevertheless, the trade-off between statistical guarantees and utility poses challenges for many applications. Therefore, this thesis aims to develop techniques that balance guarantees and utility, focusing on improving the reliability of generative models while preserving their flexibility.
First, we propose a framework that enables the generation of text conditionally using hard constraints, allowing users to specify certain elements in advance while leaving others open for the model’s prediction. By facilitating interactive editing and rewriting, this framework provides users with precise control over the generated text.
Next, we introduce conformal prediction methods for generating predictions under soft constraints, ensuring statistical correctness. These methods produce valid confidence sets for text generation while maintaining high empirical precision.
Finally, we explore the balance between privacy and utility in data release by relaxing the notion of guarantees from differential privacy to a definition based on guesswork. We present a learning-based approach to de-identification, addressing the challenges of privacy preservation while still enabling effective data utilization.
The effectiveness of our proposed methods is demonstrated through a range of tasks, including text infilling, radiology report generation, and X-ray classification. These tasks showcase the utility of our techniques in various practical scenarios. |
first_indexed | 2024-09-23T12:56:56Z |
format | Thesis |
id | mit-1721.1/151388 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T12:56:56Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1513882023-08-01T03:41:46Z Generative Modeling with Guarantees Quach, Victor Barzilay, Regina Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Language models have become ubiquitous in natural language processing, leveraging large amounts of unlabeled data and fine-tuning for downstream tasks. However, concerns have been raised regarding the accuracy and trustworthiness of the text generated by these models. In parallel, differential privacy has emerged as a framework to protect sensitive information while allowing machine learning algorithms to learn from it. Nevertheless, the trade-off between statistical guarantees and utility poses challenges for many applications. Therefore, this thesis aims to develop techniques that balance guarantees and utility, focusing on improving the reliability of generative models while preserving their flexibility. First, we propose a framework that enables the generation of text conditionally using hard constraints, allowing users to specify certain elements in advance while leaving others open for the model’s prediction. By facilitating interactive editing and rewriting, this framework provides users with precise control over the generated text. Next, we introduce conformal prediction methods for generating predictions under soft constraints, ensuring statistical correctness. These methods produce valid confidence sets for text generation while maintaining high empirical precision. Finally, we explore the balance between privacy and utility in data release by relaxing the notion of guarantees from differential privacy to a definition based on guesswork. We present a learning-based approach to de-identification, addressing the challenges of privacy preservation while still enabling effective data utilization. The effectiveness of our proposed methods is demonstrated through a range of tasks, including text infilling, radiology report generation, and X-ray classification. These tasks showcase the utility of our techniques in various practical scenarios. Ph.D. 2023-07-31T19:35:58Z 2023-07-31T19:35:58Z 2023-06 2023-07-13T14:26:50.372Z Thesis https://hdl.handle.net/1721.1/151388 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Quach, Victor Generative Modeling with Guarantees |
title | Generative Modeling with Guarantees |
title_full | Generative Modeling with Guarantees |
title_fullStr | Generative Modeling with Guarantees |
title_full_unstemmed | Generative Modeling with Guarantees |
title_short | Generative Modeling with Guarantees |
title_sort | generative modeling with guarantees |
url | https://hdl.handle.net/1721.1/151388 |
work_keys_str_mv | AT quachvictor generativemodelingwithguarantees |