Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency

With the advancement of machine learning models and the rapid increase in their range of applications, learning algorithms should not only have the capacity to learn complex tasks, but also be resilient to imperfect data, all while being resource efficient. This thesis explores trade-offs between th...

Full description

Bibliographic Details
Main Author:	Blanchard, Moïse
Other Authors:	Jaillet, Patrick
Format:	Thesis
Published:	Massachusetts Institute of Technology 2024
Online Access:	https://hdl.handle.net/1721.1/155479

_version_	1826211829392605184
author	Blanchard, Moïse
author2	Jaillet, Patrick
author_facet	Jaillet, Patrick Blanchard, Moïse
author_sort	Blanchard, Moïse
collection	MIT
description	With the advancement of machine learning models and the rapid increase in their range of applications, learning algorithms should not only have the capacity to learn complex tasks, but also be resilient to imperfect data, all while being resource efficient. This thesis explores trade-offs between these three core challenges in statistical learning theory. We aim to understand the limits of learning algorithms across a wide range of machine learning and optimization settings, with the goal of providing adaptable, robust, and efficient learning algorithms for decision-making. In Part I of this thesis, we study the limits of learning with respect to generalizability and data assumptions following the universal learning framework. In universal learning, we seek general algorithms that have convergence guarantees for any objective task without structural restrictions. While this cannot be achieved without conditions on the training data, we show that in general this can be performed beyond standard statistical assumptions. More generally, we aim to characterize provably-minimal assumptions for which universal learning can be performed, and to provide algorithms that learn under these minimal assumptions. After giving a detailed overview of the framework and a summary of our results in Chapter 2, we investigate universal learnability across a wide range of machine learning settings: full-feedback in realizable online learning (Chapter 3), supervised learning with arbitrary or adversarial noise (Chapter 4); partial-feedback in standard contextual bandits (Chapter 5) and, as a first step towards more complex reinforcement learning settings, contextual bandits with non-stationary or adversarial rewards (Chapter 6). We investigate the impact of resource constraints in Part II, specifically of memory constraints in convex optimization. The efficiency of optimization algorithms is typically measured through the number of calls to a first-order oracle which provides value and gradient information on the function, aptly referred to as oracle-complexity. However, this may not be the only bottleneck; understanding the trade-offs with the usage of resources such as memory could pave the way for more practical optimization algorithms. Following this reasoning, we make advancements in characterizing achievable regions for optimization algorithms in the oracle-complexity/memory landscape. In Chapter 7 we show that full memory is necessary to achieve the optimal oracle-complexity for deterministic algorithms; hence, classical cutting-plane methods are Pareto-optimal in the oracle-complexity/memory trade-off. On the positive side, we provide memory-efficient algorithms in Chapter 8 for high-accuracy regimes (sub-polynomial in the dimension). In exponential-accuracy regimes, these algorithms strictly improve the oracle-complexity of gradient descent while preserving the same optimal memory usage. These algorithms can in fact be used for the more general feasibility problem for which we give improved lower-bound trade-offs in Chapter 9. These results imply that in standard accuracy regimes (polynomial in the dimension), gradient descent is also Pareto-optimal and reveal a phase transition for the oracle-complexity of memory-constrained algorithms.
first_indexed	2024-09-23T15:12:02Z
format	Thesis
id	mit-1721.1/155479
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T15:12:02Z
publishDate	2024
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1554792024-07-09T03:11:09Z Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency Blanchard, Moïse Jaillet, Patrick Massachusetts Institute of Technology. Operations Research Center Sloan School of Management With the advancement of machine learning models and the rapid increase in their range of applications, learning algorithms should not only have the capacity to learn complex tasks, but also be resilient to imperfect data, all while being resource efficient. This thesis explores trade-offs between these three core challenges in statistical learning theory. We aim to understand the limits of learning algorithms across a wide range of machine learning and optimization settings, with the goal of providing adaptable, robust, and efficient learning algorithms for decision-making. In Part I of this thesis, we study the limits of learning with respect to generalizability and data assumptions following the universal learning framework. In universal learning, we seek general algorithms that have convergence guarantees for any objective task without structural restrictions. While this cannot be achieved without conditions on the training data, we show that in general this can be performed beyond standard statistical assumptions. More generally, we aim to characterize provably-minimal assumptions for which universal learning can be performed, and to provide algorithms that learn under these minimal assumptions. After giving a detailed overview of the framework and a summary of our results in Chapter 2, we investigate universal learnability across a wide range of machine learning settings: full-feedback in realizable online learning (Chapter 3), supervised learning with arbitrary or adversarial noise (Chapter 4); partial-feedback in standard contextual bandits (Chapter 5) and, as a first step towards more complex reinforcement learning settings, contextual bandits with non-stationary or adversarial rewards (Chapter 6). We investigate the impact of resource constraints in Part II, specifically of memory constraints in convex optimization. The efficiency of optimization algorithms is typically measured through the number of calls to a first-order oracle which provides value and gradient information on the function, aptly referred to as oracle-complexity. However, this may not be the only bottleneck; understanding the trade-offs with the usage of resources such as memory could pave the way for more practical optimization algorithms. Following this reasoning, we make advancements in characterizing achievable regions for optimization algorithms in the oracle-complexity/memory landscape. In Chapter 7 we show that full memory is necessary to achieve the optimal oracle-complexity for deterministic algorithms; hence, classical cutting-plane methods are Pareto-optimal in the oracle-complexity/memory trade-off. On the positive side, we provide memory-efficient algorithms in Chapter 8 for high-accuracy regimes (sub-polynomial in the dimension). In exponential-accuracy regimes, these algorithms strictly improve the oracle-complexity of gradient descent while preserving the same optimal memory usage. These algorithms can in fact be used for the more general feasibility problem for which we give improved lower-bound trade-offs in Chapter 9. These results imply that in standard accuracy regimes (polynomial in the dimension), gradient descent is also Pareto-optimal and reveal a phase transition for the oracle-complexity of memory-constrained algorithms. Ph.D. 2024-07-08T18:54:00Z 2024-07-08T18:54:00Z 2024-05 2024-05-30T21:18:58.123Z Thesis https://hdl.handle.net/1721.1/155479 0000-0003-0593-8666 Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Blanchard, Moïse Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title	Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title_full	Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title_fullStr	Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title_full_unstemmed	Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title_short	Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title_sort	fundamental limits of learning for generalizability data resilience and resource efficiency
url	https://hdl.handle.net/1721.1/155479
work_keys_str_mv	AT blanchardmoise fundamentallimitsoflearningforgeneralizabilitydataresilienceandresourceefficiency

Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency

Similar Items