Fast global convergence of gradient methods for high-dimensional statistical recovery

Many statistical M-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a high-di...

Full description

Bibliographic Details
Main Authors: Agarwal, Alekh, Negahban, Sahand N., Wainwright, Martin J.
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: Institute of Mathematical Statistics 2013
Online Access:http://hdl.handle.net/1721.1/78602
Description
Summary:Many statistical M-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a high-dimensional framework that allows the ambient dimension d to grow with (and possibly exceed) the sample size n. Our theory identifies conditions under which projected gradient descent enjoys globally linear convergence up to the statistical precision of the model, meaning the typical distance between the true unknown parameter θ[superscript ∗] and an optimal solution [ˆ over θ]. By establishing these conditions with high probability for numerous statistical models, our analysis applies to a wide range of M-estimators, including sparse linear regression using Lasso; group Lasso for block sparsity; log-linear models with regularization; low-rank matrix recovery using nuclear norm regularization; and matrix decomposition using a combination of the nuclear and ℓ[subscript 1] norms. Overall, our analysis reveals interesting connections between statistical and computational efficiency in high-dimensional estimation.