Learning rates as a function of batch size: a random matrix theory approach to neural network training
We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory. We demonstrate that the magnitude of the extremal values of the batch Hessian are larger than those of the empirical Hessian. We also derive similar results for the...
Main Authors: | , , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
Journal of Machine Learning Research
2022
|