Musings on Deep Learning: Properties of SGD
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradi...
Main Authors: | , , , , , , |
---|---|
Format: | Technical Report |
Language: | en_US |
Published: |
Center for Brains, Minds and Machines (CBMM)
2017
|
Online Access: | http://hdl.handle.net/1721.1/107841 |
_version_ | 1826197756296822784 |
---|---|
author | Zhang, Chiyuan Liao, Qianli Rakhlin, Alexander Sridharan, Karthik Miranda, Brando Golowich, Noah Poggio, Tomaso |
author_facet | Zhang, Chiyuan Liao, Qianli Rakhlin, Alexander Sridharan, Karthik Miranda, Brando Golowich, Noah Poggio, Tomaso |
author_sort | Zhang, Chiyuan |
collection | MIT |
description | [previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradient Descent (SGD) selects with high probability solutions that 1) have zero (or small) empirical error, 2) are degenerate as shown in Theory II and 3) have maximum generalization. |
first_indexed | 2024-09-23T10:52:40Z |
format | Technical Report |
id | mit-1721.1/107841 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T10:52:40Z |
publishDate | 2017 |
publisher | Center for Brains, Minds and Machines (CBMM) |
record_format | dspace |
spelling | mit-1721.1/1078412019-04-11T09:51:03Z Musings on Deep Learning: Properties of SGD Zhang, Chiyuan Liao, Qianli Rakhlin, Alexander Sridharan, Karthik Miranda, Brando Golowich, Noah Poggio, Tomaso [previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradient Descent (SGD) selects with high probability solutions that 1) have zero (or small) empirical error, 2) are degenerate as shown in Theory II and 3) have maximum generalization. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. H.M. is supported in part by ARO Grant W911NF-15-1- 0385. 2017-04-04T21:32:29Z 2017-04-04T21:32:29Z 2017-04-04 Technical Report Working Paper Other http://hdl.handle.net/1721.1/107841 en_US CBMM Memo Series;067 Attribution-NonCommercial-ShareAlike 3.0 United States http://creativecommons.org/licenses/by-nc-sa/3.0/us/ application/pdf application/pdf application/pdf Center for Brains, Minds and Machines (CBMM) |
spellingShingle | Zhang, Chiyuan Liao, Qianli Rakhlin, Alexander Sridharan, Karthik Miranda, Brando Golowich, Noah Poggio, Tomaso Musings on Deep Learning: Properties of SGD |
title | Musings on Deep Learning: Properties of SGD |
title_full | Musings on Deep Learning: Properties of SGD |
title_fullStr | Musings on Deep Learning: Properties of SGD |
title_full_unstemmed | Musings on Deep Learning: Properties of SGD |
title_short | Musings on Deep Learning: Properties of SGD |
title_sort | musings on deep learning properties of sgd |
url | http://hdl.handle.net/1721.1/107841 |
work_keys_str_mv | AT zhangchiyuan musingsondeeplearningpropertiesofsgd AT liaoqianli musingsondeeplearningpropertiesofsgd AT rakhlinalexander musingsondeeplearningpropertiesofsgd AT sridharankarthik musingsondeeplearningpropertiesofsgd AT mirandabrando musingsondeeplearningpropertiesofsgd AT golowichnoah musingsondeeplearningpropertiesofsgd AT poggiotomaso musingsondeeplearningpropertiesofsgd |