Computational and statistical challenges in high dimensional statistical models

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Bibliographic Details
Main Author: Zadik, Ilias.
Other Authors: David Gamarnik.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2020
Subjects:
Online Access:https://hdl.handle.net/1721.1/123708
_version_ 1811075475414450176
author Zadik, Ilias.
author2 David Gamarnik.
author_facet David Gamarnik.
Zadik, Ilias.
author_sort Zadik, Ilias.
collection MIT
description This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
first_indexed 2024-09-23T10:06:42Z
format Thesis
id mit-1721.1/123708
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T10:06:42Z
publishDate 2020
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1237082020-02-11T03:26:47Z Computational and statistical challenges in high dimensional statistical models Zadik, Ilias. David Gamarnik. Massachusetts Institute of Technology. Operations Research Center. Massachusetts Institute of Technology. Operations Research Center Operations Research Center. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2019 Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 289-301). This thesis focuses on two long-studied high-dimensional statistical models, namely (1) the high-dimensional linear regression (HDLR) model, where the goal is to recover a hidden vector of coefficients from noisy linear observations, and (2) the planted clique (PC) model, where the goal is to recover a hidden community structure from a much larger observed network. The following results are established. First, under assumptions, we identify the exact statistical limit of the model, that is the minimum signal strength allowing a statistically accurate inference of the hidden vector. We couple this result with an all-or-nothing information theoretic (IT) phase transition. We prove that above the statistical limit, it is IT possible to almost-perfectly recover the hidden vector, while below the statistical limit, it is IT impossible to achieve non-trivial correlation with the hidden vector. Second, we study the computational-statistical gap of the sparse HDLR model; The statistical limit of the model is significantly smaller than its apparent computational limit, which is the minimum signal strength required by known computationally-efficient methods to perform statistical inference. We propose an explanation of the gap by analyzing the Overlap Gap Property (OGP) for HDLR. The OGP is known to be linked with algorithmic hardness in the theory of average-case optimization. We prove that the OGP for HDLR appears, up-to-constants, simultaneously with the computational-statistical gap, suggesting the OGP is a fundamental source of algorithmic hardness for HDLR. Third, we focus on noiseless HDLR. Here we do not assume sparsity, but we make a certain rationality assumption on the coefficients. In this case, we propose a polynomial-time recovery method based on the Lenstra-Lenstra-Lóvasz lattice basis reduction algorithm. We prove that the method obtains notable guarantees, as it recovers the hidden vector with using only one observation. Finally, we study the computational-statistical gap of the PC model. Similar to HDLR, we analyze the presence of OGP for the PC model. We provide strong (first-moment) evidence that again the OGP coincides with the model's computational-statistical gap. For this reason, we conjecture that the OGP provides a fundamental algorithmic barrier for PC as well, and potentially in a generic sense for high-dimensional statistical tasks. by Ilias. Zadik. Ph. D. Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center 2020-02-10T21:37:19Z 2020-02-10T21:37:19Z 2019 2019 Thesis https://hdl.handle.net/1721.1/123708 1138021665 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 301 pages application/pdf Massachusetts Institute of Technology
spellingShingle Operations Research Center.
Zadik, Ilias.
Computational and statistical challenges in high dimensional statistical models
title Computational and statistical challenges in high dimensional statistical models
title_full Computational and statistical challenges in high dimensional statistical models
title_fullStr Computational and statistical challenges in high dimensional statistical models
title_full_unstemmed Computational and statistical challenges in high dimensional statistical models
title_short Computational and statistical challenges in high dimensional statistical models
title_sort computational and statistical challenges in high dimensional statistical models
topic Operations Research Center.
url https://hdl.handle.net/1721.1/123708
work_keys_str_mv AT zadikilias computationalandstatisticalchallengesinhighdimensionalstatisticalmodels