Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks

This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network,...

Full description

Bibliographic Details
Main Author: Shrihari Vasudevan
Format: Article
Language:English
Published: MDPI AG 2020-05-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/22/5/560
_version_ 1827716739974037504
author Shrihari Vasudevan
author_facet Shrihari Vasudevan
author_sort Shrihari Vasudevan
collection DOAJ
description This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach.
first_indexed 2024-03-10T19:47:12Z
format Article
id doaj.art-eadcb722b44c4c488434afcd47c5e260
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-10T19:47:12Z
publishDate 2020-05-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-eadcb722b44c4c488434afcd47c5e2602023-11-20T00:44:42ZengMDPI AGEntropy1099-43002020-05-0122556010.3390/e22050560Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural NetworksShrihari Vasudevan0IBM Research, Bangalore 560045, IndiaThis paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach.https://www.mdpi.com/1099-4300/22/5/560deep neural networksstochastic gradient descentmutual informationadaptive learning rate
spellingShingle Shrihari Vasudevan
Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
Entropy
deep neural networks
stochastic gradient descent
mutual information
adaptive learning rate
title Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_full Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_fullStr Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_full_unstemmed Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_short Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_sort mutual information based learning rate decay for stochastic gradient descent training of deep neural networks
topic deep neural networks
stochastic gradient descent
mutual information
adaptive learning rate
url https://www.mdpi.com/1099-4300/22/5/560
work_keys_str_mv AT shriharivasudevan mutualinformationbasedlearningratedecayforstochasticgradientdescenttrainingofdeepneuralnetworks