Towards machine learning models robust to adversarial examples and backdoor attacks

In the past decade, machine learning spectacularly succeeded on many challenging benchmarks. However, are our machine learning models ready to leave this lab setting and be safely deployed in high-stakes real-world applications? In this thesis, we take steps towards making this vision a reality by d...

Full description

Bibliographic Details
Main Author:	Makelov, Aleksandar
Other Authors:	Mądry, Aleksander
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/147387

_version_	1826198424789188608
author	Makelov, Aleksandar
author2	Mądry, Aleksander
author_facet	Mądry, Aleksander Makelov, Aleksandar
author_sort	Makelov, Aleksandar
collection	MIT
description	In the past decade, machine learning spectacularly succeeded on many challenging benchmarks. However, are our machine learning models ready to leave this lab setting and be safely deployed in high-stakes real-world applications? In this thesis, we take steps towards making this vision a reality by developing and applying new frameworks for making modern machine learning systems more robust. In particular, we make progress on two major modes of brittleness of such systems: adversarial examples and backdoor data poisoning attacks. Specifically, in the first part of the thesis, we build a methodology for defending against adversarial examples that is the first one to provide non-trivial adversarial robustness against an adaptive adversary. In the second part, we develop a framework for backdoor data poisoning attacks, and show how, under natural assumptions, our theoretical results motivate an algorithm to flag and remove potentially poisoned examples that is empirically successful. We conclude with a brief exploration of preliminary evidence that this framework can also be applied to other data modalities, such as tabular data, and other machine learning models, such as ensembles of decision trees.
first_indexed	2024-09-23T11:04:41Z
format	Thesis
id	mit-1721.1/147387
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T11:04:41Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1473872023-01-20T03:42:14Z Towards machine learning models robust to adversarial examples and backdoor attacks Makelov, Aleksandar Mądry, Aleksander Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science In the past decade, machine learning spectacularly succeeded on many challenging benchmarks. However, are our machine learning models ready to leave this lab setting and be safely deployed in high-stakes real-world applications? In this thesis, we take steps towards making this vision a reality by developing and applying new frameworks for making modern machine learning systems more robust. In particular, we make progress on two major modes of brittleness of such systems: adversarial examples and backdoor data poisoning attacks. Specifically, in the first part of the thesis, we build a methodology for defending against adversarial examples that is the first one to provide non-trivial adversarial robustness against an adaptive adversary. In the second part, we develop a framework for backdoor data poisoning attacks, and show how, under natural assumptions, our theoretical results motivate an algorithm to flag and remove potentially poisoned examples that is empirically successful. We conclude with a brief exploration of preliminary evidence that this framework can also be applied to other data modalities, such as tabular data, and other machine learning models, such as ensembles of decision trees. Ph.D. 2023-01-19T18:49:51Z 2023-01-19T18:49:51Z 2022-09 2022-10-19T19:09:20.307Z Thesis https://hdl.handle.net/1721.1/147387 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Makelov, Aleksandar Towards machine learning models robust to adversarial examples and backdoor attacks
title	Towards machine learning models robust to adversarial examples and backdoor attacks
title_full	Towards machine learning models robust to adversarial examples and backdoor attacks
title_fullStr	Towards machine learning models robust to adversarial examples and backdoor attacks
title_full_unstemmed	Towards machine learning models robust to adversarial examples and backdoor attacks
title_short	Towards machine learning models robust to adversarial examples and backdoor attacks
title_sort	towards machine learning models robust to adversarial examples and backdoor attacks
url	https://hdl.handle.net/1721.1/147387
work_keys_str_mv	AT makelovaleksandar towardsmachinelearningmodelsrobusttoadversarialexamplesandbackdoorattacks

Towards machine learning models robust to adversarial examples and backdoor attacks

Similar Items