Computational and statistical challenges in high dimensional statistical models
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/1721.1/123708 |
_version_ | 1811075475414450176 |
---|---|
author | Zadik, Ilias. |
author2 | David Gamarnik. |
author_facet | David Gamarnik. Zadik, Ilias. |
author_sort | Zadik, Ilias. |
collection | MIT |
description | This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. |
first_indexed | 2024-09-23T10:06:42Z |
format | Thesis |
id | mit-1721.1/123708 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T10:06:42Z |
publishDate | 2020 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1237082020-02-11T03:26:47Z Computational and statistical challenges in high dimensional statistical models Zadik, Ilias. David Gamarnik. Massachusetts Institute of Technology. Operations Research Center. Massachusetts Institute of Technology. Operations Research Center Operations Research Center. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2019 Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 289-301). This thesis focuses on two long-studied high-dimensional statistical models, namely (1) the high-dimensional linear regression (HDLR) model, where the goal is to recover a hidden vector of coefficients from noisy linear observations, and (2) the planted clique (PC) model, where the goal is to recover a hidden community structure from a much larger observed network. The following results are established. First, under assumptions, we identify the exact statistical limit of the model, that is the minimum signal strength allowing a statistically accurate inference of the hidden vector. We couple this result with an all-or-nothing information theoretic (IT) phase transition. We prove that above the statistical limit, it is IT possible to almost-perfectly recover the hidden vector, while below the statistical limit, it is IT impossible to achieve non-trivial correlation with the hidden vector. Second, we study the computational-statistical gap of the sparse HDLR model; The statistical limit of the model is significantly smaller than its apparent computational limit, which is the minimum signal strength required by known computationally-efficient methods to perform statistical inference. We propose an explanation of the gap by analyzing the Overlap Gap Property (OGP) for HDLR. The OGP is known to be linked with algorithmic hardness in the theory of average-case optimization. We prove that the OGP for HDLR appears, up-to-constants, simultaneously with the computational-statistical gap, suggesting the OGP is a fundamental source of algorithmic hardness for HDLR. Third, we focus on noiseless HDLR. Here we do not assume sparsity, but we make a certain rationality assumption on the coefficients. In this case, we propose a polynomial-time recovery method based on the Lenstra-Lenstra-Lóvasz lattice basis reduction algorithm. We prove that the method obtains notable guarantees, as it recovers the hidden vector with using only one observation. Finally, we study the computational-statistical gap of the PC model. Similar to HDLR, we analyze the presence of OGP for the PC model. We provide strong (first-moment) evidence that again the OGP coincides with the model's computational-statistical gap. For this reason, we conjecture that the OGP provides a fundamental algorithmic barrier for PC as well, and potentially in a generic sense for high-dimensional statistical tasks. by Ilias. Zadik. Ph. D. Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center 2020-02-10T21:37:19Z 2020-02-10T21:37:19Z 2019 2019 Thesis https://hdl.handle.net/1721.1/123708 1138021665 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 301 pages application/pdf Massachusetts Institute of Technology |
spellingShingle | Operations Research Center. Zadik, Ilias. Computational and statistical challenges in high dimensional statistical models |
title | Computational and statistical challenges in high dimensional statistical models |
title_full | Computational and statistical challenges in high dimensional statistical models |
title_fullStr | Computational and statistical challenges in high dimensional statistical models |
title_full_unstemmed | Computational and statistical challenges in high dimensional statistical models |
title_short | Computational and statistical challenges in high dimensional statistical models |
title_sort | computational and statistical challenges in high dimensional statistical models |
topic | Operations Research Center. |
url | https://hdl.handle.net/1721.1/123708 |
work_keys_str_mv | AT zadikilias computationalandstatisticalchallengesinhighdimensionalstatisticalmodels |