SGD Noise and Implicit Low-Rank Bias in Deep Neural Networks

SGD Noise and Implicit Low-Rank Bias in Deep Neural Networks

We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight decay. We prove that the source of the SGD noise is an implicit low rank constraint across all of the weight matrices within the network. Furthermore, we show, both theoretically and empirically, that...

Full description

Bibliographic Details
Main Authors:	Galanti, Tomer, Poggio, Tomaso
Format:	Article
Published:	Center for Brains, Minds and Machines (CBMM) 2022
Online Access:	https://hdl.handle.net/1721.1/141380

Similar Items

SGD and Weight Decay Provably Induce a Low-Rank Bias in Deep Neural Networks
by: Galanti, Tomer, et al.
Published: (2023)

The Janus effects of SGD vs GD: high noise and low rank
by: Xu, Mengjia, et al.
Published: (2023)

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds
by: Mengjia Xu, et al.
Published: (2023-01-01)

Musings on Deep Learning: Properties of SGD
by: Zhang, Chiyuan, et al.
Published: (2017)

Theory of Deep Learning IIb: Optimization Properties of SGD
by: Zhang, Chiyuan, et al.
Published: (2018)

Loss landscape: SGD can have a better view than GD
by: Poggio, Tomaso, et al.
Published: (2020)

Implicit dynamic regularization in deep networks
by: Poggio, Tomaso, et al.
Published: (2020)

On Generalization Bounds for Neural Networks with Low Rank Layers
by: Pinto, Andrea, et al.
Published: (2024)

Norm-Based Generalization Bounds for Compositionally Sparse Neural Network
by: Galanti, Tomer, et al.
Published: (2023)

Stochastic resetting mitigates latent gradient bias of SGD from label noise
by: Youngkyoung Bae, et al.
Published: (2025-01-01)

Formation of Representations in Neural Networks
by: Ziyin, Liu, et al.
Published: (2024)

Feature learning in deep classifiers through Intermediate Neural Collapse
by: Rangamani, Akshay, et al.
Published: (2023)

The Low-rank Simplicity Bias in Deep Networks
by: Huh, Minyoung
Published: (2022)

Do deep neural networks suffer from crowding?
by: Poggio, Tomaso, et al.
Published: (2021)

FOSTERING DEEP LEARNING APPROACH WITH SMALL GROUP DISCUSSION (SGD
by: Qazi Masroor Ali, et al.
Published: (2018-08-01)

Do deep neural networks suffer from crowding?
by: Volokitin, Anna, et al.
Published: (2022)

Do Deep Neural Networks Suffer from Crowding?
by: Volokitin, Anna, et al.
Published: (2017)

A Novel Method for Medical Image Segmentation based on Convolutional Neural Networks with SGD Optimization
by: M. Taheri, et al.
Published: (2021-01-01)

Privacy-Preserving SGD on Shuffle Model
by: Lingjie Zhang, et al.
Published: (2023-01-01)

Distributed SGD With Flexible Gradient Compression
by: Tran Thi Phuong, et al.
Published: (2020-01-01)

From Associative Memories to Deep Networks
by: Poggio, Tomaso
Published: (2021)

Distributed SignSGD With Improved Accuracy and Network-Fault Tolerance
by: Trieu Le Phong, et al.
Published: (2020-01-01)

Communication Scheduling for Gossip SGD in a Wide Area Network
by: Hideaki Oguni, et al.
Published: (2021-01-01)

On the Power of Decision Trees in Auto-Regressive Language Modeling
by: Gan, Yulu, et al.
Published: (2024)

Is SGD a Bayesian sampler? Well, almost
by: Mingard, C, et al.
Published: (2021)

Automatic billing counterfeit detection for SGD money
by: Arun Ramchandani.
Published: (2010)

Accelerating Distributed SGD With Group Hybrid Parallelism
by: Kyung-No Joo, et al.
Published: (2021-01-01)

An Overview of Some Issues in the Theory of Deep Networks
by: Poggio, Tomaso, et al.
Published: (2022)

An Overview of Some Issues in the Theory of Deep Networks
by: Poggio, Tomaso, et al.
Published: (2021)

Learning Curve Analysis on Adam, Sgd, and Adagrad Optimizers on a Convolutional Neural Network Model for Cancer Cells Recognition
by: Jose David Zambrano Jara, et al.
Published: (2023-01-01)

Random shuffling beats SGD after finite epochs
by: HaoChen, Jeff, et al.
Published: (2021)

Communication-Efficient Distributed SGD with Error-Feedback, Revisited
by: Tran Thi Phuong, et al.
Published: (2021-04-01)

An Optimization Strategy Based on Hybrid Algorithm of Adam and SGD
by: Wang Yijun, et al.
Published: (2018-01-01)

An analysis of training and generalization errors in shallow and deep networks
by: Mhaskar, Hrushikesh, et al.
Published: (2018)

Deep vs. shallow networks : An approximation theory perspective
by: Mhaskar, Hrushikesh, et al.
Published: (2016)

SGD-TripleQNet: An Integrated Deep Reinforcement Learning Model for Vehicle Lane-Change Decision
by: Yang Liu, et al.
Published: (2025-01-01)

Theoretical Issues in Deep Networks
by: Poggio, Tomaso, et al.
Published: (2019)

Function approximation by deep networks
by: Mhaskar, H. N., et al.
Published: (2021)

Theoretical issues in deep networks
by: Poggio, Tomaso, et al.
Published: (2021)

Hybrid Deep Learning Framework for Reduction of Mixed Noise via Low Rank Noise Estimation
by: Dai-Gyoung Kim, et al.
Published: (2022-01-01)