Stochastic Bigger Subspace Algorithms for Nonconvex Stochastic Optimization

It is well known that the stochastic optimization problem can be regarded as one of the most hard problems since, in most of the cases, the values of <inline-formula> <tex-math notation="LaTeX">$f$ </tex-math></inline-formula> and its gradient are often not easily t...

Full description

Bibliographic Details
Main Authors: Gonglin Yuan, Yingjie Zhou, Liping Wang, Qingyuan Yang
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9524602/
Description
Summary:It is well known that the stochastic optimization problem can be regarded as one of the most hard problems since, in most of the cases, the values of <inline-formula> <tex-math notation="LaTeX">$f$ </tex-math></inline-formula> and its gradient are often not easily to be solved, or the <inline-formula> <tex-math notation="LaTeX">$F(\cdot, \xi)$ </tex-math></inline-formula> is normally not given clearly and (or) the distribution function <inline-formula> <tex-math notation="LaTeX">$P$ </tex-math></inline-formula> is equivocal. Then an effective optimization algorithm is successfully designed and used to solve this problem that is an interesting work. This paper designs stochastic bigger subspace algorithms for solving nonconvex stochastic optimization problems. A general framework for such algorithm is presented for convergence analysis, where the so-called the sufficient descent property, the trust region feature, and the global convergence of the stationary points are proved under the suitable conditions. In the worst-case, we will turn out that the complexity is competitive under a given accuracy parameter. We will proved that the <inline-formula> <tex-math notation="LaTeX">$SFO$ </tex-math></inline-formula>-calls complexity of the presented algorithm with diminishing steplength is <inline-formula> <tex-math notation="LaTeX">$O\left({\epsilon ^{-{\frac {1}{1-\beta }}}}\right)$ </tex-math></inline-formula> and the <inline-formula> <tex-math notation="LaTeX">$SFO$ </tex-math></inline-formula>-calls complexity of the given algorithm with random constant steplength is <inline-formula> <tex-math notation="LaTeX">$O(\epsilon ^{-2})$ </tex-math></inline-formula> respectively, where <inline-formula> <tex-math notation="LaTeX">$\beta \in (0.5,1)$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula> is accuracy and the needed conditions are weaker than the quasi-Newton methods and the normal conjugate gradient algorithms. The detail algorithm framework with variance reduction is also proposed for experiments and the nonconvex binary classification problem is done to demonstrate the performance of the given algorithm.
ISSN:2169-3536