A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivat...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-01-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/22/1/101 |
_version_ | 1811183847453229056 |
---|---|
author | Rita Fioresi Pratik Chaudhari Stefano Soatto |
author_facet | Rita Fioresi Pratik Chaudhari Stefano Soatto |
author_sort | Rita Fioresi |
collection | DOAJ |
description | This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former. |
first_indexed | 2024-04-11T13:03:30Z |
format | Article |
id | doaj.art-21d83ed6dd34421083c9cbde1cc87a9e |
institution | Directory Open Access Journal |
issn | 1099-4300 |
language | English |
last_indexed | 2024-04-11T13:03:30Z |
publishDate | 2020-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Entropy |
spelling | doaj.art-21d83ed6dd34421083c9cbde1cc87a9e2022-12-22T04:22:51ZengMDPI AGEntropy1099-43002020-01-0122110110.3390/e22010101e22010101A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion MetricsRita Fioresi0Pratik Chaudhari1Stefano Soatto2Dipartimento di Matematica, piazza Porta San Donato 5, University of Bologna, 40126 Bologna, ItalyDepartment of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104, USAComputer Science Department, University of California, Los Angeles, CA 90095, USAThis paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former.https://www.mdpi.com/1099-4300/22/1/101stochastic gradient descentdeep learninggeneral relativity |
spellingShingle | Rita Fioresi Pratik Chaudhari Stefano Soatto A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics Entropy stochastic gradient descent deep learning general relativity |
title | A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics |
title_full | A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics |
title_fullStr | A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics |
title_full_unstemmed | A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics |
title_short | A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics |
title_sort | geometric interpretation of stochastic gradient descent using diffusion metrics |
topic | stochastic gradient descent deep learning general relativity |
url | https://www.mdpi.com/1099-4300/22/1/101 |
work_keys_str_mv | AT ritafioresi ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics AT pratikchaudhari ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics AT stefanosoatto ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics AT ritafioresi geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics AT pratikchaudhari geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics AT stefanosoatto geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics |