Learning rates as a function of batch size: a random matrix theory approach to neural network training

We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory. We demonstrate that the magnitude of the extremal values of the batch Hessian are larger than those of the empirical Hessian. We also derive similar results for the...

Full description

Bibliographic Details
Main Authors: Granziol, D, Zohren, S, Roberts, S
Format: Journal article
Language:English
Published: Journal of Machine Learning Research 2022