A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivat...

Full description

Bibliographic Details
Main Authors: Rita Fioresi, Pratik Chaudhari, Stefano Soatto
Format: Article
Language:English
Published: MDPI AG 2020-01-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/22/1/101
_version_ 1811183847453229056
author Rita Fioresi
Pratik Chaudhari
Stefano Soatto
author_facet Rita Fioresi
Pratik Chaudhari
Stefano Soatto
author_sort Rita Fioresi
collection DOAJ
description This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former.
first_indexed 2024-04-11T13:03:30Z
format Article
id doaj.art-21d83ed6dd34421083c9cbde1cc87a9e
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-04-11T13:03:30Z
publishDate 2020-01-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-21d83ed6dd34421083c9cbde1cc87a9e2022-12-22T04:22:51ZengMDPI AGEntropy1099-43002020-01-0122110110.3390/e22010101e22010101A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion MetricsRita Fioresi0Pratik Chaudhari1Stefano Soatto2Dipartimento di Matematica, piazza Porta San Donato 5, University of Bologna, 40126 Bologna, ItalyDepartment of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104, USAComputer Science Department, University of California, Los Angeles, CA 90095, USAThis paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former.https://www.mdpi.com/1099-4300/22/1/101stochastic gradient descentdeep learninggeneral relativity
spellingShingle Rita Fioresi
Pratik Chaudhari
Stefano Soatto
A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
Entropy
stochastic gradient descent
deep learning
general relativity
title A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
title_full A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
title_fullStr A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
title_full_unstemmed A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
title_short A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
title_sort geometric interpretation of stochastic gradient descent using diffusion metrics
topic stochastic gradient descent
deep learning
general relativity
url https://www.mdpi.com/1099-4300/22/1/101
work_keys_str_mv AT ritafioresi ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics
AT pratikchaudhari ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics
AT stefanosoatto ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics
AT ritafioresi geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics
AT pratikchaudhari geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics
AT stefanosoatto geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics