Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account o...

全面介紹

書目詳細資料
Main Authors:	Jingcheng Zhou, Wei Wei, Ruizhi Zhang, Zhiming Zheng
格式:	Article
語言:	English
出版:	MDPI AG 2021-06-01
叢編:	Mathematics
主題:	stochastic gradient descent damped Newton convexity
在線閱讀:	https://www.mdpi.com/2227-7390/9/13/1533

_version_	1827688454431965184
author	Jingcheng Zhou Wei Wei Ruizhi Zhang Zhiming Zheng
author_facet	Jingcheng Zhou Wei Wei Ruizhi Zhang Zhiming Zheng
author_sort	Jingcheng Zhou
collection	DOAJ
description	First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs for regression problems with mean square error (MSE) and classification problems with cross-entropy loss (CEL). In contrast to other second-order methods for estimating the Hessian matrix of all parameters, our methods only accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the convergence of the learning process much faster and more accurate than SGD and Adagrad. Several numerical experiments on real datasets were performed to verify the effectiveness of our methods for regression and classification problems.
first_indexed	2024-03-10T09:57:00Z
format	Article
id	doaj.art-ac4c45e65d8e4b51a639c8534e563120
institution	Directory Open Access Journal
issn	2227-7390
language	English
last_indexed	2024-03-10T09:57:00Z
publishDate	2021-06-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj.art-ac4c45e65d8e4b51a639c8534e5631202023-11-22T02:18:02ZengMDPI AGMathematics2227-73902021-06-01913153310.3390/math9131533Damped Newton Stochastic Gradient Descent Method for Neural Networks TrainingJingcheng Zhou0Wei Wei1Ruizhi Zhang2Zhiming Zheng3School of Mathematical Sciences, Beihang University, Beijing 100191, ChinaSchool of Mathematical Sciences, Beihang University, Beijing 100191, ChinaSchool of Mathematical Sciences, Beihang University, Beijing 100191, ChinaSchool of Mathematical Sciences, Beihang University, Beijing 100191, ChinaFirst-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs for regression problems with mean square error (MSE) and classification problems with cross-entropy loss (CEL). In contrast to other second-order methods for estimating the Hessian matrix of all parameters, our methods only accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the convergence of the learning process much faster and more accurate than SGD and Adagrad. Several numerical experiments on real datasets were performed to verify the effectiveness of our methods for regression and classification problems.https://www.mdpi.com/2227-7390/9/13/1533stochastic gradient descentdamped Newtonconvexity
spellingShingle	Jingcheng Zhou Wei Wei Ruizhi Zhang Zhiming Zheng Damped Newton Stochastic Gradient Descent Method for Neural Networks Training Mathematics stochastic gradient descent damped Newton convexity
title	Damped Newton Stochastic Gradient Descent Method for Neural Networks Training
title_full	Damped Newton Stochastic Gradient Descent Method for Neural Networks Training
title_fullStr	Damped Newton Stochastic Gradient Descent Method for Neural Networks Training
title_full_unstemmed	Damped Newton Stochastic Gradient Descent Method for Neural Networks Training
title_short	Damped Newton Stochastic Gradient Descent Method for Neural Networks Training
title_sort	damped newton stochastic gradient descent method for neural networks training
topic	stochastic gradient descent damped Newton convexity
url	https://www.mdpi.com/2227-7390/9/13/1533
work_keys_str_mv	AT jingchengzhou dampednewtonstochasticgradientdescentmethodforneuralnetworkstraining AT weiwei dampednewtonstochasticgradientdescentmethodforneuralnetworkstraining AT ruizhizhang dampednewtonstochasticgradientdescentmethodforneuralnetworkstraining AT zhimingzheng dampednewtonstochasticgradientdescentmethodforneuralnetworkstraining

Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

相似書籍