The statistical complexity of early-stopped mirror descent
Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the...
Main Authors: | , , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
Neural Information Processing Systems Foundation, Inc.
2020
|
_version_ | 1826272666154172416 |
---|---|
author | Vaškevičius, T Kanade, V Rebeschini, P |
author_facet | Vaškevičius, T Kanade, V Rebeschini, P |
author_sort | Vaškevičius, T |
collection | OXFORD |
description | Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with the squared loss for linear models and kernel methods. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step-size, and the number of iterations. We apply our theory to recover, in a rather clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature, while also showing how to improve upon them in some settings. |
first_indexed | 2024-03-06T22:16:11Z |
format | Journal article |
id | oxford-uuid:5375520d-881a-4e5f-bd18-016874a6c86e |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-06T22:16:11Z |
publishDate | 2020 |
publisher | Neural Information Processing Systems Foundation, Inc. |
record_format | dspace |
spelling | oxford-uuid:5375520d-881a-4e5f-bd18-016874a6c86e2022-03-26T16:31:49ZThe statistical complexity of early-stopped mirror descentJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:5375520d-881a-4e5f-bd18-016874a6c86eEnglishSymplectic ElementsNeural Information Processing Systems Foundation, Inc.2020Vaškevičius, TKanade, VRebeschini, PRecently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with the squared loss for linear models and kernel methods. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step-size, and the number of iterations. We apply our theory to recover, in a rather clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature, while also showing how to improve upon them in some settings. |
spellingShingle | Vaškevičius, T Kanade, V Rebeschini, P The statistical complexity of early-stopped mirror descent |
title | The statistical complexity of early-stopped mirror descent |
title_full | The statistical complexity of early-stopped mirror descent |
title_fullStr | The statistical complexity of early-stopped mirror descent |
title_full_unstemmed | The statistical complexity of early-stopped mirror descent |
title_short | The statistical complexity of early-stopped mirror descent |
title_sort | statistical complexity of early stopped mirror descent |
work_keys_str_mv | AT vaskeviciust thestatisticalcomplexityofearlystoppedmirrordescent AT kanadev thestatisticalcomplexityofearlystoppedmirrordescent AT rebeschinip thestatisticalcomplexityofearlystoppedmirrordescent AT vaskeviciust statisticalcomplexityofearlystoppedmirrordescent AT kanadev statisticalcomplexityofearlystoppedmirrordescent AT rebeschinip statisticalcomplexityofearlystoppedmirrordescent |