A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivat...

Full description

Bibliographic Details
Main Authors: Rita Fioresi, Pratik Chaudhari, Stefano Soatto
Format: Article
Language:English
Published: MDPI AG 2020-01-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/22/1/101