On random embeddings and their application to optimisation

<p>Random embeddings project high-dimensional spaces to low-dimensional ones; they are careful constructions which allow the approximate preservation of key properties, such as the pair-wise distances between points. Often in the field of optimisation, one needs to explore high-dimensional spa...

Ful tanımlama

Detaylı Bibliyografya
Yazar: Shao, Z
Diğer Yazarlar: Cartis, C
Materyal Türü: Tez
Dil:English
Baskı/Yayın Bilgisi: 2021
Konular:
Diğer Bilgiler
Özet:<p>Random embeddings project high-dimensional spaces to low-dimensional ones; they are careful constructions which allow the approximate preservation of key properties, such as the pair-wise distances between points. Often in the field of optimisation, one needs to explore high-dimensional spaces representing the problem data or its parameters and thus the computational cost of solving an optimisation problem is connected to the size of the data/variables. This thesis studies the theoretical properties of norm-preserving random embeddings, and their application to several classes of optimisation problems.</p> <p>Our investigations into random projections present subspace embedding properties for s-hashing ensembles — sparse random matrices with s non-zero entries per column — that are optimal in the projection dimension m of the sketch, namely, m = O(d) where d is the dimension of the subspace. A diverse set of results are presented that address the case when the input matrix has sufficiently low coherence; how the acceptable co- herence changes with the number s of non-zeros per column in the s-hashing matrices, or is reduced through suitable transformations. In particular, we propose a new ran- dom embedding, the Hashed Randomised Hadamard Transform, that improves upon the Subsampled Randomised Hadamard Transform by replacing sub-sampling with hashing.</p> <p>We apply these sketching techniques to linear least squares problems, as part of a Blendenpik-type algorithm, that uses a sketch of the data matrix to build a high quality preconditioner and then solves a preconditioned formulation of the original problem. We also include suitable linear algebra tools for rank-deficient and for sparse problems that lead to our implementation, Ski-LLS, outperforming not only sketching-based routines on randomly-generated input, but also state of the art direct solver SPQR and iterative code HSL on certain subsets of the sparse Florida matrix collection; namely, on least squares problems that are significantly over-determined, or moderately sparse, or difficult.</p> <p>Instead of sketching in the data/observational space as in the linear least squares case above, we then consider sketching in the variable/parameter domain for a more generic problem and algorithm. We propose a general random-subspace first-order framework for unconstrained non-convex optimisation that requires a weak probabilistic assump- tion on the subspace gradient, which we show to be satisfied by various random matrix ensembles, such as Gaussian and hashing sketching. We show that, when safeguarded with trust region or quadratic regularisation techniques, this random subspace approach satisfies, with high probability, a complexity bound of order O (ε−2) to drive the (full) gradient norm below ε; matching in the accuracy order, deterministic counterparts of these methods and securing almost sure convergence. We then particularise this frame- work to random subspace Gauss-Newton methods for nonlinear least squares problems, that only require the calculation of the Jacobian matrix in a subspace, with similar complexity guarantees.</p> <p>We further investigate second-order methods for non-convex optimisation, and propose a Random Subspace Adaptive Regularised Cubic (R-ARC) method, which we analyse under various assumptions on the objective function and the sketching matrices. We show that, when the sketching matrix achieves a subspace embedding of the augmented matrix of the gradient and the Hessian with sufficiently high probability, then the R- ARC method satisfies, with high probability, a complexity bound of order O (ε−3/2) to drive the (full) gradient norm below ε; matching in the accuracy order the deterministic counterpart (ARC). We also show that the same complexity bound is obtained when the Hessian matrix has sparse rows and appropriate sketching matrices are chosen. We also investigate R-ARC’s convergence to second order critical points. We show that the R-ARC method also drives the Hessian in the subspace to be approximately positive semi-definite with high probability, for a variety of sketching matrices; and furthermore if the Hessian matrix has low rank and scaled Gaussian sketching matrices are used, the R-ARC drives the (full) Hessian to be approximately positive semi-definite, with high probability, at the rate O (ε−3), again matching in the accuracy order its deterministic counterpart.</p>