The statistical complexity of early-stopped mirror descent

Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the...

Full description

Bibliographic Details
Main Authors: Vaškevičius, T, Kanade, V, Rebeschini, P
Format: Journal article
Language:English
Published: Neural Information Processing Systems Foundation, Inc. 2020
_version_ 1826272666154172416
author Vaškevičius, T
Kanade, V
Rebeschini, P
author_facet Vaškevičius, T
Kanade, V
Rebeschini, P
author_sort Vaškevičius, T
collection OXFORD
description Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with the squared loss for linear models and kernel methods. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step-size, and the number of iterations. We apply our theory to recover, in a rather clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature, while also showing how to improve upon them in some settings.
first_indexed 2024-03-06T22:16:11Z
format Journal article
id oxford-uuid:5375520d-881a-4e5f-bd18-016874a6c86e
institution University of Oxford
language English
last_indexed 2024-03-06T22:16:11Z
publishDate 2020
publisher Neural Information Processing Systems Foundation, Inc.
record_format dspace
spelling oxford-uuid:5375520d-881a-4e5f-bd18-016874a6c86e2022-03-26T16:31:49ZThe statistical complexity of early-stopped mirror descentJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:5375520d-881a-4e5f-bd18-016874a6c86eEnglishSymplectic ElementsNeural Information Processing Systems Foundation, Inc.2020Vaškevičius, TKanade, VRebeschini, PRecently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with the squared loss for linear models and kernel methods. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step-size, and the number of iterations. We apply our theory to recover, in a rather clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature, while also showing how to improve upon them in some settings.
spellingShingle Vaškevičius, T
Kanade, V
Rebeschini, P
The statistical complexity of early-stopped mirror descent
title The statistical complexity of early-stopped mirror descent
title_full The statistical complexity of early-stopped mirror descent
title_fullStr The statistical complexity of early-stopped mirror descent
title_full_unstemmed The statistical complexity of early-stopped mirror descent
title_short The statistical complexity of early-stopped mirror descent
title_sort statistical complexity of early stopped mirror descent
work_keys_str_mv AT vaskeviciust thestatisticalcomplexityofearlystoppedmirrordescent
AT kanadev thestatisticalcomplexityofearlystoppedmirrordescent
AT rebeschinip thestatisticalcomplexityofearlystoppedmirrordescent
AT vaskeviciust statisticalcomplexityofearlystoppedmirrordescent
AT kanadev statisticalcomplexityofearlystoppedmirrordescent
AT rebeschinip statisticalcomplexityofearlystoppedmirrordescent