Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks

This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network,...

Full description

Bibliographic Details
Main Author:	Shrihari Vasudevan
Format:	Article
Language:	English
Published:	MDPI AG 2020-05-01
Series:	Entropy
Subjects:	deep neural networks stochastic gradient descent mutual information adaptive learning rate
Online Access:	https://www.mdpi.com/1099-4300/22/5/560

_version_	1827716739974037504
author	Shrihari Vasudevan
author_facet	Shrihari Vasudevan
author_sort	Shrihari Vasudevan
collection	DOAJ
description	This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach.
first_indexed	2024-03-10T19:47:12Z
format	Article
id	doaj.art-eadcb722b44c4c488434afcd47c5e260
institution	Directory Open Access Journal
issn	1099-4300
language	English
last_indexed	2024-03-10T19:47:12Z
publishDate	2020-05-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj.art-eadcb722b44c4c488434afcd47c5e2602023-11-20T00:44:42ZengMDPI AGEntropy1099-43002020-05-0122556010.3390/e22050560Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural NetworksShrihari Vasudevan0IBM Research, Bangalore 560045, IndiaThis paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach.https://www.mdpi.com/1099-4300/22/5/560deep neural networksstochastic gradient descentmutual informationadaptive learning rate
spellingShingle	Shrihari Vasudevan Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks Entropy deep neural networks stochastic gradient descent mutual information adaptive learning rate
title	Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_full	Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_fullStr	Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_full_unstemmed	Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_short	Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_sort	mutual information based learning rate decay for stochastic gradient descent training of deep neural networks
topic	deep neural networks stochastic gradient descent mutual information adaptive learning rate
url	https://www.mdpi.com/1099-4300/22/5/560
work_keys_str_mv	AT shriharivasudevan mutualinformationbasedlearningratedecayforstochasticgradientdescenttrainingofdeepneuralnetworks

Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks

Similar Items