Theory IIIb: Generalization in Deep Networks
The general features of the optimization problem for the case of overparametrized nonlinear networks have been clear for a while: SGD selects with high probability global minima vs local minima. In the overparametrized case, the key question is not optimization of the empirical risk but optimization...
Main Authors: | , , , , |
---|---|
Format: | Technical Report |
Language: | en_US |
Published: |
Center for Brains, Minds and Machines (CBMM), arXiv.org
2018
|
Online Access: | http://hdl.handle.net/1721.1/116692 |