Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning

In this study, a target-network update of deep reinforcement learning (DRL) based on mutual information (MI) and rewards is proposed. In DRL, updating the target network from the Q network was used to reduce training diversity and contribute to the stability of learning. If it is not properly update...

Full description

Bibliographic Details
Main Author: Chayoung Kim
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/15/10/1840
_version_ 1797572157857857536
author Chayoung Kim
author_facet Chayoung Kim
author_sort Chayoung Kim
collection DOAJ
description In this study, a target-network update of deep reinforcement learning (DRL) based on mutual information (MI) and rewards is proposed. In DRL, updating the target network from the Q network was used to reduce training diversity and contribute to the stability of learning. If it is not properly updated, the overall update rate is reduced to mitigate this problem. Simply slowing down is not recommended because it reduces the speed of the decaying learning rate. Some studies have been conducted to improve the issues with the t-soft update based on the Student’s-t distribution or a method that does not use the target-network. However, there are certain situations in which using the Student’s-t distribution might fail or force it to use more hyperparameters. A few studies have used MI in deep neural networks to improve the decaying learning rate and directly update the target-network by replaying experiences. Therefore, in this study, the MI and reward provided in the experience replay of DRL are combined to improve both the decaying learning rate and the target-network updating. Utilizing rewards is appropriate for use in environments with intrinsic symmetry. It has been confirmed in various OpenAI gymnasiums that stable learning is possible while maintaining an improvement in the decaying learning rate.
first_indexed 2024-03-10T20:51:30Z
format Article
id doaj.art-1df942bae2f64bf1b28af990a60254c4
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-03-10T20:51:30Z
publishDate 2023-09-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-1df942bae2f64bf1b28af990a60254c42023-11-19T18:17:32ZengMDPI AGSymmetry2073-89942023-09-011510184010.3390/sym15101840Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement LearningChayoung Kim0College of Liberal Arts and Interdisciplinary Studies, Kyonggi University, 154-42 Gwanggyosan-ro, Yeongtong-gu, Suwon-si 16227, Gyeonggi-do, Republic of KoreaIn this study, a target-network update of deep reinforcement learning (DRL) based on mutual information (MI) and rewards is proposed. In DRL, updating the target network from the Q network was used to reduce training diversity and contribute to the stability of learning. If it is not properly updated, the overall update rate is reduced to mitigate this problem. Simply slowing down is not recommended because it reduces the speed of the decaying learning rate. Some studies have been conducted to improve the issues with the t-soft update based on the Student’s-t distribution or a method that does not use the target-network. However, there are certain situations in which using the Student’s-t distribution might fail or force it to use more hyperparameters. A few studies have used MI in deep neural networks to improve the decaying learning rate and directly update the target-network by replaying experiences. Therefore, in this study, the MI and reward provided in the experience replay of DRL are combined to improve both the decaying learning rate and the target-network updating. Utilizing rewards is appropriate for use in environments with intrinsic symmetry. It has been confirmed in various OpenAI gymnasiums that stable learning is possible while maintaining an improvement in the decaying learning rate.https://www.mdpi.com/2073-8994/15/10/1840deep reinforcement learningtarget-networkdeep neural networksmutual informationlearning rate
spellingShingle Chayoung Kim
Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
Symmetry
deep reinforcement learning
target-network
deep neural networks
mutual information
learning rate
title Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
title_full Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
title_fullStr Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
title_full_unstemmed Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
title_short Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
title_sort target network update linked with learning rate decay based on mutual information and reward in deep reinforcement learning
topic deep reinforcement learning
target-network
deep neural networks
mutual information
learning rate
url https://www.mdpi.com/2073-8994/15/10/1840
work_keys_str_mv AT chayoungkim targetnetworkupdatelinkedwithlearningratedecaybasedonmutualinformationandrewardindeepreinforcementlearning