Shrnutí: | In modern statistical and machine learning models, structural constraints are usually imposed for model interpretability as well as model complexity reduction. In this thesis, we present scalable optimization methods for several large-scale machine learning problems under structural constraints, with a focus on shape constraints in nonparametric statistics and sparsity in high-dimensional statistics.
In the first chapter, we consider the subgradient regularized convex regression problem, which aims to fit a convex function between the target variable and covariates. We propose novel large-scale algorithms, based on proximal gradient descent and active set methods, and derive novel linear convergence guarantees for our proposed algorithms. Empirically, our framework can approximately solve instances with n=100,000 and d=10 within minutes.
In the second chapter, we develop a new computational framework for computing log-concave density MLE, based on smoothing techniques in combination with an appropriate integral discretization of increasing accuracy. We establish convergence guarantees of our approaches and demonstrate significant runtime improvements over earlier convex approaches.
In the third chapter, we focus on Gaussian Graphical Models, which aims to estimate a sparse precision matrix from iid multivariate Gaussian samples. We propose a novel estimator via ℓ₀ℓ₂-penalized pseudolikelihood. We then design a specialized nonlinear Branch-and-Bound (BnB) framework that solves a mixed integer programming (MIP) formulation of the proposed estimator. Our estimator is computationally scalable to p~10,000, and provides faster runtime compared to competing ℓ₁ approaches, while leading to superior statistical performance.
In the fourth chapter, we further look into improving the BnB framework for sparse learning problems with the ℓ₀ℓ₂ penalty and general convex smooth losses. We present a novel screening procedure within the BnB framework to fix relaxed variables to 0 or 1 with guarantees. Our experiments indicate that this screening procedure can significantly reduce the runtimes of BnB solvers.
|