Large-Scale Algorithms for Machine Learning: Efficiency, Estimation Errors, and Beyond

Optimization algorithms stand as a cornerstone for machine learning and statistical inference. The advent of large-scale datasets introduces computational challenges, necessitating the pursuit of more efficient algorithms. Modern optimization techniques are usually tailored to particular machine lea...

Full description

Bibliographic Details
Main Author: Wang, Haoyue
Other Authors: Mazumder, Rahul
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/155490
_version_ 1826196852001734656
author Wang, Haoyue
author2 Mazumder, Rahul
author_facet Mazumder, Rahul
Wang, Haoyue
author_sort Wang, Haoyue
collection MIT
description Optimization algorithms stand as a cornerstone for machine learning and statistical inference. The advent of large-scale datasets introduces computational challenges, necessitating the pursuit of more efficient algorithms. Modern optimization techniques are usually tailored to particular machine learning issues. These approaches leverage the unique structural characteristics of the problems, resulting in enhanced efficiency compared to current methods applied to these issues. Another key aspect in examining learning algorithms involves comprehending the estimation precision of the derived estimator. In some scenarios, while achieving exact optimization on the training set may be impractical, certain straightforward and effective heuristics can demonstrate commendable estimation accuracy within an appropriate statistical framework. In this thesis, we examine a few large-scale algorithms from both optimization and statistics perspectives. In Chapters 2 and 3, we study two continuous optimization algorithms tailored to structural constraints. Chapter 2 focuses on a generalized Frank-Wolfe method for unbounded constraints with cylinder-like constraints. Chapter 3 focuses on a CD-like method for polyhedral constraints with a small number of extreme points. Both methods have state-of-the-art performance due to their awareness of the problem structures. In Chapter 4, we study a variant of linear regression with possible mismatches between interpreter-response pairs. We study a simple and efficient heuristic method, and give a rigorous analysis of its estimation error in a statistical setting. In Chapters 5 and 6, we examine two algorithms for decision trees. Chapter 5 studies the computation of optimal decision trees, and introduces a new branch-and-bound method for optimal decision trees with general continuous features. Chapter 6 turns to the analysis of the CART algorithm under a sufficient impurity decrease condition. We prove tight error bounds for signals functions with this condition, and discuss a few function classes that satisfy this condition. Chapter 7 studies a density estimation problem with shape restrictions. We propose a cubic-Newton method framework for the computation, and also study the approximation property of the finite mixture.
first_indexed 2024-09-23T10:38:40Z
format Thesis
id mit-1721.1/155490
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T10:38:40Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1554902024-07-09T03:45:21Z Large-Scale Algorithms for Machine Learning: Efficiency, Estimation Errors, and Beyond Wang, Haoyue Mazumder, Rahul Massachusetts Institute of Technology. Operations Research Center Sloan School of Management Optimization algorithms stand as a cornerstone for machine learning and statistical inference. The advent of large-scale datasets introduces computational challenges, necessitating the pursuit of more efficient algorithms. Modern optimization techniques are usually tailored to particular machine learning issues. These approaches leverage the unique structural characteristics of the problems, resulting in enhanced efficiency compared to current methods applied to these issues. Another key aspect in examining learning algorithms involves comprehending the estimation precision of the derived estimator. In some scenarios, while achieving exact optimization on the training set may be impractical, certain straightforward and effective heuristics can demonstrate commendable estimation accuracy within an appropriate statistical framework. In this thesis, we examine a few large-scale algorithms from both optimization and statistics perspectives. In Chapters 2 and 3, we study two continuous optimization algorithms tailored to structural constraints. Chapter 2 focuses on a generalized Frank-Wolfe method for unbounded constraints with cylinder-like constraints. Chapter 3 focuses on a CD-like method for polyhedral constraints with a small number of extreme points. Both methods have state-of-the-art performance due to their awareness of the problem structures. In Chapter 4, we study a variant of linear regression with possible mismatches between interpreter-response pairs. We study a simple and efficient heuristic method, and give a rigorous analysis of its estimation error in a statistical setting. In Chapters 5 and 6, we examine two algorithms for decision trees. Chapter 5 studies the computation of optimal decision trees, and introduces a new branch-and-bound method for optimal decision trees with general continuous features. Chapter 6 turns to the analysis of the CART algorithm under a sufficient impurity decrease condition. We prove tight error bounds for signals functions with this condition, and discuss a few function classes that satisfy this condition. Chapter 7 studies a density estimation problem with shape restrictions. We propose a cubic-Newton method framework for the computation, and also study the approximation property of the finite mixture. Ph.D. 2024-07-08T18:54:47Z 2024-07-08T18:54:47Z 2024-05 2024-05-30T21:19:19.098Z Thesis https://hdl.handle.net/1721.1/155490 Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Wang, Haoyue
Large-Scale Algorithms for Machine Learning: Efficiency, Estimation Errors, and Beyond
title Large-Scale Algorithms for Machine Learning: Efficiency, Estimation Errors, and Beyond
title_full Large-Scale Algorithms for Machine Learning: Efficiency, Estimation Errors, and Beyond
title_fullStr Large-Scale Algorithms for Machine Learning: Efficiency, Estimation Errors, and Beyond
title_full_unstemmed Large-Scale Algorithms for Machine Learning: Efficiency, Estimation Errors, and Beyond
title_short Large-Scale Algorithms for Machine Learning: Efficiency, Estimation Errors, and Beyond
title_sort large scale algorithms for machine learning efficiency estimation errors and beyond
url https://hdl.handle.net/1721.1/155490
work_keys_str_mv AT wanghaoyue largescalealgorithmsformachinelearningefficiencyestimationerrorsandbeyond