Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network,...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-05-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/22/5/560 |
_version_ | 1827716739974037504 |
---|---|
author | Shrihari Vasudevan |
author_facet | Shrihari Vasudevan |
author_sort | Shrihari Vasudevan |
collection | DOAJ |
description | This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach. |
first_indexed | 2024-03-10T19:47:12Z |
format | Article |
id | doaj.art-eadcb722b44c4c488434afcd47c5e260 |
institution | Directory Open Access Journal |
issn | 1099-4300 |
language | English |
last_indexed | 2024-03-10T19:47:12Z |
publishDate | 2020-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Entropy |
spelling | doaj.art-eadcb722b44c4c488434afcd47c5e2602023-11-20T00:44:42ZengMDPI AGEntropy1099-43002020-05-0122556010.3390/e22050560Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural NetworksShrihari Vasudevan0IBM Research, Bangalore 560045, IndiaThis paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach.https://www.mdpi.com/1099-4300/22/5/560deep neural networksstochastic gradient descentmutual informationadaptive learning rate |
spellingShingle | Shrihari Vasudevan Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks Entropy deep neural networks stochastic gradient descent mutual information adaptive learning rate |
title | Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks |
title_full | Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks |
title_fullStr | Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks |
title_full_unstemmed | Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks |
title_short | Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks |
title_sort | mutual information based learning rate decay for stochastic gradient descent training of deep neural networks |
topic | deep neural networks stochastic gradient descent mutual information adaptive learning rate |
url | https://www.mdpi.com/1099-4300/22/5/560 |
work_keys_str_mv | AT shriharivasudevan mutualinformationbasedlearningratedecayforstochasticgradientdescenttrainingofdeepneuralnetworks |