SGD Noise and Implicit Low-Rank Bias in Deep Neural Networks
We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight decay. We prove that the source of the SGD noise is an implicit low rank constraint across all of the weight matrices within the network. Furthermore, we show, both theoretically and empirically, that...
Main Authors: | , |
---|---|
Format: | Article |
Published: |
Center for Brains, Minds and Machines (CBMM)
2022
|
Online Access: | https://hdl.handle.net/1721.1/141380 |