Musings on Deep Learning: Properties of SGD

[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradi...

Full description

Bibliographic Details
Main Authors: Zhang, Chiyuan, Liao, Qianli, Rakhlin, Alexander, Sridharan, Karthik, Miranda, Brando, Golowich, Noah, Poggio, Tomaso
Format: Technical Report
Language:en_US
Published: Center for Brains, Minds and Machines (CBMM) 2017
Online Access:http://hdl.handle.net/1721.1/107841
_version_ 1826197756296822784
author Zhang, Chiyuan
Liao, Qianli
Rakhlin, Alexander
Sridharan, Karthik
Miranda, Brando
Golowich, Noah
Poggio, Tomaso
author_facet Zhang, Chiyuan
Liao, Qianli
Rakhlin, Alexander
Sridharan, Karthik
Miranda, Brando
Golowich, Noah
Poggio, Tomaso
author_sort Zhang, Chiyuan
collection MIT
description [previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradient Descent (SGD) selects with high probability solutions that 1) have zero (or small) empirical error, 2) are degenerate as shown in Theory II and 3) have maximum generalization.
first_indexed 2024-09-23T10:52:40Z
format Technical Report
id mit-1721.1/107841
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T10:52:40Z
publishDate 2017
publisher Center for Brains, Minds and Machines (CBMM)
record_format dspace
spelling mit-1721.1/1078412019-04-11T09:51:03Z Musings on Deep Learning: Properties of SGD Zhang, Chiyuan Liao, Qianli Rakhlin, Alexander Sridharan, Karthik Miranda, Brando Golowich, Noah Poggio, Tomaso [previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradient Descent (SGD) selects with high probability solutions that 1) have zero (or small) empirical error, 2) are degenerate as shown in Theory II and 3) have maximum generalization. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. H.M. is supported in part by ARO Grant W911NF-15-1- 0385. 2017-04-04T21:32:29Z 2017-04-04T21:32:29Z 2017-04-04 Technical Report Working Paper Other http://hdl.handle.net/1721.1/107841 en_US CBMM Memo Series;067 Attribution-NonCommercial-ShareAlike 3.0 United States http://creativecommons.org/licenses/by-nc-sa/3.0/us/ application/pdf application/pdf application/pdf Center for Brains, Minds and Machines (CBMM)
spellingShingle Zhang, Chiyuan
Liao, Qianli
Rakhlin, Alexander
Sridharan, Karthik
Miranda, Brando
Golowich, Noah
Poggio, Tomaso
Musings on Deep Learning: Properties of SGD
title Musings on Deep Learning: Properties of SGD
title_full Musings on Deep Learning: Properties of SGD
title_fullStr Musings on Deep Learning: Properties of SGD
title_full_unstemmed Musings on Deep Learning: Properties of SGD
title_short Musings on Deep Learning: Properties of SGD
title_sort musings on deep learning properties of sgd
url http://hdl.handle.net/1721.1/107841
work_keys_str_mv AT zhangchiyuan musingsondeeplearningpropertiesofsgd
AT liaoqianli musingsondeeplearningpropertiesofsgd
AT rakhlinalexander musingsondeeplearningpropertiesofsgd
AT sridharankarthik musingsondeeplearningpropertiesofsgd
AT mirandabrando musingsondeeplearningpropertiesofsgd
AT golowichnoah musingsondeeplearningpropertiesofsgd
AT poggiotomaso musingsondeeplearningpropertiesofsgd