SGD Noise and Implicit Low-Rank Bias in Deep Neural Networks
We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight decay. We prove that the source of the SGD noise is an implicit low rank constraint across all of the weight matrices within the network. Furthermore, we show, both theoretically and empirically, that...
Main Authors: | Galanti, Tomer, Poggio, Tomaso |
---|---|
Format: | Article |
Published: |
Center for Brains, Minds and Machines (CBMM)
2022
|
Online Access: | https://hdl.handle.net/1721.1/141380 |
Similar Items
-
SGD and Weight Decay Provably Induce a Low-Rank Bias in Deep Neural Networks
by: Galanti, Tomer, et al.
Published: (2023) -
The Janus effects of SGD vs GD: high noise and low rank
by: Xu, Mengjia, et al.
Published: (2023) -
Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds
by: Mengjia Xu, et al.
Published: (2023-01-01) -
Musings on Deep Learning: Properties of SGD
by: Zhang, Chiyuan, et al.
Published: (2017) -
Theory of Deep Learning IIb: Optimization Properties of SGD
by: Zhang, Chiyuan, et al.
Published: (2018)