Learning rates as a function of batch size: a random matrix theory approach to neural network training

We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory. We demonstrate that the magnitude of the extremal values of the batch Hessian are larger than those of the empirical Hessian. We also derive similar results for the...

Deskribapen osoa

Xehetasun bibliografikoak
Egile Nagusiak: Granziol, D, Zohren, S, Roberts, S
Formatua: Journal article
Hizkuntza:English
Argitaratua: Journal of Machine Learning Research 2022