aSGD: Stochastic Gradient Descent with Adaptive Batch Size for Every Parameter

In recent years, deep neural networks (DNN) have been widely used in many fields. Lots of effort has been put into training due to their numerous parameters in a deep network. Some complex optimizers with many hyperparameters have been utilized to accelerate the process of network training and impro...

Full description

Bibliographic Details
Main Authors: Haoze Shi, Naisen Yang, Hong Tang, Xin Yang
Format: Article
Language:English
Published: MDPI AG 2022-03-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/10/6/863