Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability

This thesis explores ensemble methods in machine learning, a technique that builds a predictive model by jointly training simpler base models. It examines three types of ensemble methods: additive models, tree ensembles, and mixtures of experts. Each ensemble method is characterized by a specific st...

Full description

Bibliographic Details
Main Author: Ibrahim, Shibal
Other Authors: Mazumder, Rahul
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/156296
_version_ 1811071089355259904
author Ibrahim, Shibal
author2 Mazumder, Rahul
author_facet Mazumder, Rahul
Ibrahim, Shibal
author_sort Ibrahim, Shibal
collection MIT
description This thesis explores ensemble methods in machine learning, a technique that builds a predictive model by jointly training simpler base models. It examines three types of ensemble methods: additive models, tree ensembles, and mixtures of experts. Each ensemble method is characterized by a specific structure: additive models can involve base learners with single or pairwise covariates, tree ensembles use a decision tree as a base learner, and mixtures of experts typically employ neural networks. The focus of this thesis is on considering various sparsity and structural constraints within these methods and develop optimization based approaches to enhance training efficiency, inference, and/or interpretability. In the first part, we consider additive models with interactions under component selection constraints and additional structural constraints e.g., hierarchical interactions. We consider different optimization based formulations and propose efficient algorithms to learn a good subset of components. We develop two toolkits that are scalable to large number of samples and large set of pairwise interactions. In the second part, we consider tree ensemble learning. In this setting, we consider flexible and efficient formulation of differentiable tree ensemble learning. We study flexible loss functions, multitask learning etc. We also consider end-to-end feature selection in tree ensembles, i.e., we perform feature selection while training of tree ensembles. This is in contrast to popular tree ensemble learning toolkits, which perform post-training feature selection based on feature importances. Our toolkit provides substantial improvements in predictive performance for a desired feature budget. In the third part, we consider sparse gating in mixture of experts. Sparse Mixture of Experts is a paradigm where a subset of experts (typically neural networks) are activated for each input sample. This is used to scale training as well as inference of large-scale vision and language models. We consider multiple approaches to improve sparse gating in mixture of expert models. Our new approaches show improvements in large-scale experiments on machine translation as well as distillation of pre-trained models on natural language processing tasks.
first_indexed 2024-09-23T08:45:54Z
format Thesis
id mit-1721.1/156296
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T08:45:54Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1562962024-08-22T03:59:39Z Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability Ibrahim, Shibal Mazumder, Rahul Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science This thesis explores ensemble methods in machine learning, a technique that builds a predictive model by jointly training simpler base models. It examines three types of ensemble methods: additive models, tree ensembles, and mixtures of experts. Each ensemble method is characterized by a specific structure: additive models can involve base learners with single or pairwise covariates, tree ensembles use a decision tree as a base learner, and mixtures of experts typically employ neural networks. The focus of this thesis is on considering various sparsity and structural constraints within these methods and develop optimization based approaches to enhance training efficiency, inference, and/or interpretability. In the first part, we consider additive models with interactions under component selection constraints and additional structural constraints e.g., hierarchical interactions. We consider different optimization based formulations and propose efficient algorithms to learn a good subset of components. We develop two toolkits that are scalable to large number of samples and large set of pairwise interactions. In the second part, we consider tree ensemble learning. In this setting, we consider flexible and efficient formulation of differentiable tree ensemble learning. We study flexible loss functions, multitask learning etc. We also consider end-to-end feature selection in tree ensembles, i.e., we perform feature selection while training of tree ensembles. This is in contrast to popular tree ensemble learning toolkits, which perform post-training feature selection based on feature importances. Our toolkit provides substantial improvements in predictive performance for a desired feature budget. In the third part, we consider sparse gating in mixture of experts. Sparse Mixture of Experts is a paradigm where a subset of experts (typically neural networks) are activated for each input sample. This is used to scale training as well as inference of large-scale vision and language models. We consider multiple approaches to improve sparse gating in mixture of expert models. Our new approaches show improvements in large-scale experiments on machine translation as well as distillation of pre-trained models on natural language processing tasks. Ph.D. 2024-08-21T18:54:47Z 2024-08-21T18:54:47Z 2024-05 2024-07-10T13:01:37.204Z Thesis https://hdl.handle.net/1721.1/156296 0000-0002-3300-0213 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Ibrahim, Shibal
Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title_full Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title_fullStr Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title_full_unstemmed Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title_short Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title_sort nonparametric high dimensional models sparsity efficiency interpretability
url https://hdl.handle.net/1721.1/156296
work_keys_str_mv AT ibrahimshibal nonparametrichighdimensionalmodelssparsityefficiencyinterpretability