Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency

With the advancement of machine learning models and the rapid increase in their range of applications, learning algorithms should not only have the capacity to learn complex tasks, but also be resilient to imperfect data, all while being resource efficient. This thesis explores trade-offs between th...

Full description

Bibliographic Details
Main Author: Blanchard, Moïse
Other Authors: Jaillet, Patrick
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/155479
_version_ 1826211829392605184
author Blanchard, Moïse
author2 Jaillet, Patrick
author_facet Jaillet, Patrick
Blanchard, Moïse
author_sort Blanchard, Moïse
collection MIT
description With the advancement of machine learning models and the rapid increase in their range of applications, learning algorithms should not only have the capacity to learn complex tasks, but also be resilient to imperfect data, all while being resource efficient. This thesis explores trade-offs between these three core challenges in statistical learning theory. We aim to understand the limits of learning algorithms across a wide range of machine learning and optimization settings, with the goal of providing adaptable, robust, and efficient learning algorithms for decision-making. In Part I of this thesis, we study the limits of learning with respect to generalizability and data assumptions following the universal learning framework. In universal learning, we seek general algorithms that have convergence guarantees for any objective task without structural restrictions. While this cannot be achieved without conditions on the training data, we show that in general this can be performed beyond standard statistical assumptions. More generally, we aim to characterize provably-minimal assumptions for which universal learning can be performed, and to provide algorithms that learn under these minimal assumptions. After giving a detailed overview of the framework and a summary of our results in Chapter 2, we investigate universal learnability across a wide range of machine learning settings: full-feedback in realizable online learning (Chapter 3), supervised learning with arbitrary or adversarial noise (Chapter 4); partial-feedback in standard contextual bandits (Chapter 5) and, as a first step towards more complex reinforcement learning settings, contextual bandits with non-stationary or adversarial rewards (Chapter 6). We investigate the impact of resource constraints in Part II, specifically of memory constraints in convex optimization. The efficiency of optimization algorithms is typically measured through the number of calls to a first-order oracle which provides value and gradient information on the function, aptly referred to as oracle-complexity. However, this may not be the only bottleneck; understanding the trade-offs with the usage of resources such as memory could pave the way for more practical optimization algorithms. Following this reasoning, we make advancements in characterizing achievable regions for optimization algorithms in the oracle-complexity/memory landscape. In Chapter 7 we show that full memory is necessary to achieve the optimal oracle-complexity for deterministic algorithms; hence, classical cutting-plane methods are Pareto-optimal in the oracle-complexity/memory trade-off. On the positive side, we provide memory-efficient algorithms in Chapter 8 for high-accuracy regimes (sub-polynomial in the dimension). In exponential-accuracy regimes, these algorithms strictly improve the oracle-complexity of gradient descent while preserving the same optimal memory usage. These algorithms can in fact be used for the more general feasibility problem for which we give improved lower-bound trade-offs in Chapter 9. These results imply that in standard accuracy regimes (polynomial in the dimension), gradient descent is also Pareto-optimal and reveal a phase transition for the oracle-complexity of memory-constrained algorithms.
first_indexed 2024-09-23T15:12:02Z
format Thesis
id mit-1721.1/155479
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T15:12:02Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1554792024-07-09T03:11:09Z Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency Blanchard, Moïse Jaillet, Patrick Massachusetts Institute of Technology. Operations Research Center Sloan School of Management With the advancement of machine learning models and the rapid increase in their range of applications, learning algorithms should not only have the capacity to learn complex tasks, but also be resilient to imperfect data, all while being resource efficient. This thesis explores trade-offs between these three core challenges in statistical learning theory. We aim to understand the limits of learning algorithms across a wide range of machine learning and optimization settings, with the goal of providing adaptable, robust, and efficient learning algorithms for decision-making. In Part I of this thesis, we study the limits of learning with respect to generalizability and data assumptions following the universal learning framework. In universal learning, we seek general algorithms that have convergence guarantees for any objective task without structural restrictions. While this cannot be achieved without conditions on the training data, we show that in general this can be performed beyond standard statistical assumptions. More generally, we aim to characterize provably-minimal assumptions for which universal learning can be performed, and to provide algorithms that learn under these minimal assumptions. After giving a detailed overview of the framework and a summary of our results in Chapter 2, we investigate universal learnability across a wide range of machine learning settings: full-feedback in realizable online learning (Chapter 3), supervised learning with arbitrary or adversarial noise (Chapter 4); partial-feedback in standard contextual bandits (Chapter 5) and, as a first step towards more complex reinforcement learning settings, contextual bandits with non-stationary or adversarial rewards (Chapter 6). We investigate the impact of resource constraints in Part II, specifically of memory constraints in convex optimization. The efficiency of optimization algorithms is typically measured through the number of calls to a first-order oracle which provides value and gradient information on the function, aptly referred to as oracle-complexity. However, this may not be the only bottleneck; understanding the trade-offs with the usage of resources such as memory could pave the way for more practical optimization algorithms. Following this reasoning, we make advancements in characterizing achievable regions for optimization algorithms in the oracle-complexity/memory landscape. In Chapter 7 we show that full memory is necessary to achieve the optimal oracle-complexity for deterministic algorithms; hence, classical cutting-plane methods are Pareto-optimal in the oracle-complexity/memory trade-off. On the positive side, we provide memory-efficient algorithms in Chapter 8 for high-accuracy regimes (sub-polynomial in the dimension). In exponential-accuracy regimes, these algorithms strictly improve the oracle-complexity of gradient descent while preserving the same optimal memory usage. These algorithms can in fact be used for the more general feasibility problem for which we give improved lower-bound trade-offs in Chapter 9. These results imply that in standard accuracy regimes (polynomial in the dimension), gradient descent is also Pareto-optimal and reveal a phase transition for the oracle-complexity of memory-constrained algorithms. Ph.D. 2024-07-08T18:54:00Z 2024-07-08T18:54:00Z 2024-05 2024-05-30T21:18:58.123Z Thesis https://hdl.handle.net/1721.1/155479 0000-0003-0593-8666 Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Blanchard, Moïse
Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title_full Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title_fullStr Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title_full_unstemmed Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title_short Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
title_sort fundamental limits of learning for generalizability data resilience and resource efficiency
url https://hdl.handle.net/1721.1/155479
work_keys_str_mv AT blanchardmoise fundamentallimitsoflearningforgeneralizabilitydataresilienceandresourceefficiency