Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization

Abstract The stochastic Frank–Wolfe method has recently attracted much general interest in the context of optimization for statistical and machine learning due to its ability to work with a more general feasible region. However, there has been a complexity gap in the dependence on the...

Full description

Bibliographic Details
Main Authors:	Lu, Haihao, Freund, Robert M
Other Authors:	Massachusetts Institute of Technology. Department of Mathematics
Format:	Article
Language:	English
Published:	Springer Berlin Heidelberg 2021
Online Access:	https://hdl.handle.net/1721.1/136776

_version_	1826197599290392576
author	Lu, Haihao Freund, Robert M
author2	Massachusetts Institute of Technology. Department of Mathematics
author_facet	Massachusetts Institute of Technology. Department of Mathematics Lu, Haihao Freund, Robert M
author_sort	Lu, Haihao
collection	MIT
description	Abstract The stochastic Frank–Wolfe method has recently attracted much general interest in the context of optimization for statistical and machine learning due to its ability to work with a more general feasible region. However, there has been a complexity gap in the dependence on the optimality tolerance $$\varepsilon $$ ε in the guaranteed convergence rate for stochastic Frank–Wolfe compared to its deterministic counterpart. In this work, we present a new generalized stochastic Frank–Wolfe method which closes this gap for the class of structured optimization problems encountered in statistical and machine learning characterized by empirical loss minimization with a certain type of “linear prediction” property (formally defined in the paper), which is typically present in loss minimization problems in practice. Our method also introduces the notion of a “substitute gradient” that is a not-necessarily-unbiased estimate of the gradient. We show that our new method is equivalent to a particular randomized coordinate mirror descent algorithm applied to the dual problem, which in turn provides a new interpretation of randomized dual coordinate descent in the primal space. Also, in the special case of a strongly convex regularizer our generalized stochastic Frank–Wolfe method (as well as the randomized dual coordinate descent method) exhibits linear convergence. Furthermore, we present computational experiments that indicate that our method outperforms other stochastic Frank–Wolfe methods for a sufficiently small optimality tolerance, consistent with the theory developed herein.
first_indexed	2024-09-23T10:50:26Z
format	Article
id	mit-1721.1/136776
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T10:50:26Z
publishDate	2021
publisher	Springer Berlin Heidelberg
record_format	dspace
spelling	mit-1721.1/1367762023-09-27T17:57:59Z Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization Lu, Haihao Freund, Robert M Massachusetts Institute of Technology. Department of Mathematics Sloan School of Management Abstract The stochastic Frank–Wolfe method has recently attracted much general interest in the context of optimization for statistical and machine learning due to its ability to work with a more general feasible region. However, there has been a complexity gap in the dependence on the optimality tolerance $$\varepsilon $$ ε in the guaranteed convergence rate for stochastic Frank–Wolfe compared to its deterministic counterpart. In this work, we present a new generalized stochastic Frank–Wolfe method which closes this gap for the class of structured optimization problems encountered in statistical and machine learning characterized by empirical loss minimization with a certain type of “linear prediction” property (formally defined in the paper), which is typically present in loss minimization problems in practice. Our method also introduces the notion of a “substitute gradient” that is a not-necessarily-unbiased estimate of the gradient. We show that our new method is equivalent to a particular randomized coordinate mirror descent algorithm applied to the dual problem, which in turn provides a new interpretation of randomized dual coordinate descent in the primal space. Also, in the special case of a strongly convex regularizer our generalized stochastic Frank–Wolfe method (as well as the randomized dual coordinate descent method) exhibits linear convergence. Furthermore, we present computational experiments that indicate that our method outperforms other stochastic Frank–Wolfe methods for a sufficiently small optimality tolerance, consistent with the theory developed herein. 2021-11-01T14:33:17Z 2021-11-01T14:33:17Z 2020-03-04 2021-04-21T03:31:56Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/136776 en https://doi.org/10.1007/s10107-020-01480-7 Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society application/pdf Springer Berlin Heidelberg Springer Berlin Heidelberg
spellingShingle	Lu, Haihao Freund, Robert M Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization
title	Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization
title_full	Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization
title_fullStr	Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization
title_full_unstemmed	Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization
title_short	Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization
title_sort	generalized stochastic frank wolfe algorithm with stochastic substitute gradient for structured convex optimization
url	https://hdl.handle.net/1721.1/136776
work_keys_str_mv	AT luhaihao generalizedstochasticfrankwolfealgorithmwithstochasticsubstitutegradientforstructuredconvexoptimization AT freundrobertm generalizedstochasticfrankwolfealgorithmwithstochasticsubstitutegradientforstructuredconvexoptimization

Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization

Similar Items