Theory of Deep Learning IIb: Optimization Properties of SGD

In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability - like...

Full description

Bibliographic Details
Main Authors: Zhang, Chiyuan, Liao, Qianli, Rakhlin, Alexander, Miranda, Brando, Golowich, Noah, Poggio, Tomaso
Format: Technical Report
Language:en_US
Published: Center for Brains, Minds and Machines (CBMM) 2018
Online Access:http://hdl.handle.net/1721.1/115407