Combinatorial Methods in Statistics

This thesis explores combinatorial methods in random vector balancing, nonparametric estimation, and network inference. First, motivated by problems from controlled experiments, we study random vector balancing from the perspective of discrepancy theory, a classical topic in combinatorics, and give...

Full description

Bibliographic Details
Main Author: Turner, Paxton Mark
Other Authors: Philippe Rigollet
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/139383
Description
Summary:This thesis explores combinatorial methods in random vector balancing, nonparametric estimation, and network inference. First, motivated by problems from controlled experiments, we study random vector balancing from the perspective of discrepancy theory, a classical topic in combinatorics, and give sharp statistical results along with improved algorithmic guarantees. Next, we focus on the problem of density estimation and investigate the fundamental statistical limits of coresets, a popular framework for obtaining algorithmic speedups by replacing a large dataset with a representative subset. In the following chapter, motivated by the problem of fast evaluation of kernel density estimators, we demonstrate how a multivariate interpolation scheme from finite-element theory based on the combinatorial-geometric properties of a certain mesh can be used to significantly improve the storage and query time of a nonparametric estimator while also preserving its accuracy. Our final chapter focuses on pedigree reconstruction, a combinatorial inference task of recovering the latent network of familial relationships of a population from its extant genetic data.