Diversity-inducing probability measures for machine learning

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019

Библиографические подробности
Главный автор:	Li, Chengtao,Ph. D.Massachusetts Institute of Technology.
Другие авторы:	Suvrit Sra and Stefanie Jegelka.
Формат:	Диссертация
Язык:	eng
Опубликовано:	Massachusetts Institute of Technology 2019
Предметы:	Electrical Engineering and Computer Science.
Online-ссылка:	https://hdl.handle.net/1721.1/121724

_version_	1826190629560909824
author	Li, Chengtao,Ph. D.Massachusetts Institute of Technology.
author2	Suvrit Sra and Stefanie Jegelka.
author_facet	Suvrit Sra and Stefanie Jegelka. Li, Chengtao,Ph. D.Massachusetts Institute of Technology.
author_sort	Li, Chengtao,Ph. D.Massachusetts Institute of Technology.
collection	MIT
description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
first_indexed	2024-09-23T08:43:15Z
format	Thesis
id	mit-1721.1/121724
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T08:43:15Z
publishDate	2019
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1217242019-07-24T03:06:25Z Diversity-inducing probability measures for machine learning Li, Chengtao,Ph. D.Massachusetts Institute of Technology. Suvrit Sra and Stefanie Jegelka. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 Cataloged from PDF version of thesis. Includes bibliographical references (pages 163-176). Subset selection problems arise in machine learning within kernel approximation, experimental design, and numerous other applications. In such applications, one often seeks to select diverse subsets of items to represent the population. One way to select such diverse subsets is to sample according to Diversity-Inducing Probability Measures (DIPMs) that assign higher probabilities to more diverse subsets. DIPMs underlie several recent breakthroughs in mathematics and theoretical computer science, but their power has not yet been explored for machine learning. In this thesis, we investigate DIPMs, their mathematical properties, sampling algorithms, and applications. Perhaps the best known instance of a DIPM is a Determinantal Point Process (DPP). DPPs originally arose in quantum physics, and are known to have deep relations to linear algebra, combinatorics, and geometry. We explore applications of DPPs to kernel matrix approximation and kernel ridge regression. In these applications, DPPs deliver strong approximation guarantees and obtain superior performance compared to existing methods. We further develop an MCMC sampling algorithm accelerated by Gauss-type quadratures for DPPs. The algorithm runs several orders of magnitude faster than the existing ones. DPPs lie in a larger class of DIPMs called Strongly Rayleigh (SR) Measures. Instances of SR measures display a strong negative dependence property known as negative association, and as such can be used to model subset diversity. We study mathematical properties of SR measures, and construct the first provably fast-mixing Markov chain that samples from general SR measures. As a special case, we consider an SR measure called Dual Volume Sampling (DVS), for which we present the first poly-time sampling algorithm. While all considered distributions over subsets are unconstrained, those of interest in the real world usually come with constraints due to prior knowledge, resource limitations or personal preferences. Hence we investigate sampling from constrained versions of DIPMs. Specifically, we consider DIPMs with cardinality constraints and matroid base constraints and construct poly-time approximate sampling algorithms for them. Such sampling algorithms will enable practical uses of constrained DIPMs in real world. by Chengtao Li. Ph. D. Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science 2019-07-17T20:57:57Z 2019-07-17T20:57:57Z 2019 2019 Thesis https://hdl.handle.net/1721.1/121724 1102048939 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 176 pages application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Li, Chengtao,Ph. D.Massachusetts Institute of Technology. Diversity-inducing probability measures for machine learning
title	Diversity-inducing probability measures for machine learning
title_full	Diversity-inducing probability measures for machine learning
title_fullStr	Diversity-inducing probability measures for machine learning
title_full_unstemmed	Diversity-inducing probability measures for machine learning
title_short	Diversity-inducing probability measures for machine learning
title_sort	diversity inducing probability measures for machine learning
topic	Electrical Engineering and Computer Science.
url	https://hdl.handle.net/1721.1/121724
work_keys_str_mv	AT lichengtaophdmassachusettsinstituteoftechnology diversityinducingprobabilitymeasuresformachinelearning

Diversity-inducing probability measures for machine learning

Схожие документы