Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability

This thesis explores ensemble methods in machine learning, a technique that builds a predictive model by jointly training simpler base models. It examines three types of ensemble methods: additive models, tree ensembles, and mixtures of experts. Each ensemble method is characterized by a specific st...

Full description

Bibliographic Details
Main Author:	Ibrahim, Shibal
Other Authors:	Mazumder, Rahul
Format:	Thesis
Published:	Massachusetts Institute of Technology 2024
Online Access:	https://hdl.handle.net/1721.1/156296

_version_	1826190822270304256
author	Ibrahim, Shibal
author2	Mazumder, Rahul
author_facet	Mazumder, Rahul Ibrahim, Shibal
author_sort	Ibrahim, Shibal
collection	MIT
description	This thesis explores ensemble methods in machine learning, a technique that builds a predictive model by jointly training simpler base models. It examines three types of ensemble methods: additive models, tree ensembles, and mixtures of experts. Each ensemble method is characterized by a specific structure: additive models can involve base learners with single or pairwise covariates, tree ensembles use a decision tree as a base learner, and mixtures of experts typically employ neural networks. The focus of this thesis is on considering various sparsity and structural constraints within these methods and develop optimization based approaches to enhance training efficiency, inference, and/or interpretability. In the first part, we consider additive models with interactions under component selection constraints and additional structural constraints e.g., hierarchical interactions. We consider different optimization based formulations and propose efficient algorithms to learn a good subset of components. We develop two toolkits that are scalable to large number of samples and large set of pairwise interactions. In the second part, we consider tree ensemble learning. In this setting, we consider flexible and efficient formulation of differentiable tree ensemble learning. We study flexible loss functions, multitask learning etc. We also consider end-to-end feature selection in tree ensembles, i.e., we perform feature selection while training of tree ensembles. This is in contrast to popular tree ensemble learning toolkits, which perform post-training feature selection based on feature importances. Our toolkit provides substantial improvements in predictive performance for a desired feature budget. In the third part, we consider sparse gating in mixture of experts. Sparse Mixture of Experts is a paradigm where a subset of experts (typically neural networks) are activated for each input sample. This is used to scale training as well as inference of large-scale vision and language models. We consider multiple approaches to improve sparse gating in mixture of expert models. Our new approaches show improvements in large-scale experiments on machine translation as well as distillation of pre-trained models on natural language processing tasks.
first_indexed	2024-09-23T08:45:54Z
format	Thesis
id	mit-1721.1/156296
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T08:45:54Z
publishDate	2024
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1562962024-08-22T03:59:39Z Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability Ibrahim, Shibal Mazumder, Rahul Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science This thesis explores ensemble methods in machine learning, a technique that builds a predictive model by jointly training simpler base models. It examines three types of ensemble methods: additive models, tree ensembles, and mixtures of experts. Each ensemble method is characterized by a specific structure: additive models can involve base learners with single or pairwise covariates, tree ensembles use a decision tree as a base learner, and mixtures of experts typically employ neural networks. The focus of this thesis is on considering various sparsity and structural constraints within these methods and develop optimization based approaches to enhance training efficiency, inference, and/or interpretability. In the first part, we consider additive models with interactions under component selection constraints and additional structural constraints e.g., hierarchical interactions. We consider different optimization based formulations and propose efficient algorithms to learn a good subset of components. We develop two toolkits that are scalable to large number of samples and large set of pairwise interactions. In the second part, we consider tree ensemble learning. In this setting, we consider flexible and efficient formulation of differentiable tree ensemble learning. We study flexible loss functions, multitask learning etc. We also consider end-to-end feature selection in tree ensembles, i.e., we perform feature selection while training of tree ensembles. This is in contrast to popular tree ensemble learning toolkits, which perform post-training feature selection based on feature importances. Our toolkit provides substantial improvements in predictive performance for a desired feature budget. In the third part, we consider sparse gating in mixture of experts. Sparse Mixture of Experts is a paradigm where a subset of experts (typically neural networks) are activated for each input sample. This is used to scale training as well as inference of large-scale vision and language models. We consider multiple approaches to improve sparse gating in mixture of expert models. Our new approaches show improvements in large-scale experiments on machine translation as well as distillation of pre-trained models on natural language processing tasks. Ph.D. 2024-08-21T18:54:47Z 2024-08-21T18:54:47Z 2024-05 2024-07-10T13:01:37.204Z Thesis https://hdl.handle.net/1721.1/156296 0000-0002-3300-0213 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Ibrahim, Shibal Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title	Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title_full	Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title_fullStr	Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title_full_unstemmed	Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title_short	Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability
title_sort	nonparametric high dimensional models sparsity efficiency interpretability
url	https://hdl.handle.net/1721.1/156296
work_keys_str_mv	AT ibrahimshibal nonparametrichighdimensionalmodelssparsityefficiencyinterpretability

Nonparametric High-dimensional Models: Sparsity, Efficiency, Interpretability

Similar Items