Loss landscape: SGD can have a better view than GD

Consider a loss function L = 􏰀ni=1 l2i with li = f(xi) − yi, where f(x) is a deep feedforward network with R layers, no bias terms and scalar output. Assume the network is overparametrized that is, d >> n, where d is the number of parameters and n is the number of data points. The networks are...

Full description

Bibliographic Details
Main Authors:	Poggio, Tomaso, Cooper, Yaim
Format:	Technical Report
Published:	Center for Brains, Minds and Machines (CBMM) 2020
Online Access:	https://hdl.handle.net/1721.1/126041

Internet

https://hdl.handle.net/1721.1/126041

Loss landscape: SGD can have a better view than GD

Internet

Similar Items