Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning

In this study, a target-network update of deep reinforcement learning (DRL) based on mutual information (MI) and rewards is proposed. In DRL, updating the target network from the Q network was used to reduce training diversity and contribute to the stability of learning. If it is not properly update...

Full description

Bibliographic Details
Main Author:	Chayoung Kim
Format:	Article
Language:	English
Published:	MDPI AG 2023-09-01
Series:	Symmetry
Subjects:	deep reinforcement learning target-network deep neural networks mutual information learning rate
Online Access:	https://www.mdpi.com/2073-8994/15/10/1840

_version_	1797572157857857536
author	Chayoung Kim
author_facet	Chayoung Kim
author_sort	Chayoung Kim
collection	DOAJ
description	In this study, a target-network update of deep reinforcement learning (DRL) based on mutual information (MI) and rewards is proposed. In DRL, updating the target network from the Q network was used to reduce training diversity and contribute to the stability of learning. If it is not properly updated, the overall update rate is reduced to mitigate this problem. Simply slowing down is not recommended because it reduces the speed of the decaying learning rate. Some studies have been conducted to improve the issues with the t-soft update based on the Student’s-t distribution or a method that does not use the target-network. However, there are certain situations in which using the Student’s-t distribution might fail or force it to use more hyperparameters. A few studies have used MI in deep neural networks to improve the decaying learning rate and directly update the target-network by replaying experiences. Therefore, in this study, the MI and reward provided in the experience replay of DRL are combined to improve both the decaying learning rate and the target-network updating. Utilizing rewards is appropriate for use in environments with intrinsic symmetry. It has been confirmed in various OpenAI gymnasiums that stable learning is possible while maintaining an improvement in the decaying learning rate.
first_indexed	2024-03-10T20:51:30Z
format	Article
id	doaj.art-1df942bae2f64bf1b28af990a60254c4
institution	Directory Open Access Journal
issn	2073-8994
language	English
last_indexed	2024-03-10T20:51:30Z
publishDate	2023-09-01
publisher	MDPI AG
record_format	Article
series	Symmetry
spelling	doaj.art-1df942bae2f64bf1b28af990a60254c42023-11-19T18:17:32ZengMDPI AGSymmetry2073-89942023-09-011510184010.3390/sym15101840Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement LearningChayoung Kim0College of Liberal Arts and Interdisciplinary Studies, Kyonggi University, 154-42 Gwanggyosan-ro, Yeongtong-gu, Suwon-si 16227, Gyeonggi-do, Republic of KoreaIn this study, a target-network update of deep reinforcement learning (DRL) based on mutual information (MI) and rewards is proposed. In DRL, updating the target network from the Q network was used to reduce training diversity and contribute to the stability of learning. If it is not properly updated, the overall update rate is reduced to mitigate this problem. Simply slowing down is not recommended because it reduces the speed of the decaying learning rate. Some studies have been conducted to improve the issues with the t-soft update based on the Student’s-t distribution or a method that does not use the target-network. However, there are certain situations in which using the Student’s-t distribution might fail or force it to use more hyperparameters. A few studies have used MI in deep neural networks to improve the decaying learning rate and directly update the target-network by replaying experiences. Therefore, in this study, the MI and reward provided in the experience replay of DRL are combined to improve both the decaying learning rate and the target-network updating. Utilizing rewards is appropriate for use in environments with intrinsic symmetry. It has been confirmed in various OpenAI gymnasiums that stable learning is possible while maintaining an improvement in the decaying learning rate.https://www.mdpi.com/2073-8994/15/10/1840deep reinforcement learningtarget-networkdeep neural networksmutual informationlearning rate
spellingShingle	Chayoung Kim Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning Symmetry deep reinforcement learning target-network deep neural networks mutual information learning rate
title	Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
title_full	Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
title_fullStr	Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
title_full_unstemmed	Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
title_short	Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
title_sort	target network update linked with learning rate decay based on mutual information and reward in deep reinforcement learning
topic	deep reinforcement learning target-network deep neural networks mutual information learning rate
url	https://www.mdpi.com/2073-8994/15/10/1840
work_keys_str_mv	AT chayoungkim targetnetworkupdatelinkedwithlearningratedecaybasedonmutualinformationandrewardindeepreinforcementlearning

Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning

Similar Items