Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
In this study, a target-network update of deep reinforcement learning (DRL) based on mutual information (MI) and rewards is proposed. In DRL, updating the target network from the Q network was used to reduce training diversity and contribute to the stability of learning. If it is not properly update...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-09-01
|
Series: | Symmetry |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-8994/15/10/1840 |
_version_ | 1797572157857857536 |
---|---|
author | Chayoung Kim |
author_facet | Chayoung Kim |
author_sort | Chayoung Kim |
collection | DOAJ |
description | In this study, a target-network update of deep reinforcement learning (DRL) based on mutual information (MI) and rewards is proposed. In DRL, updating the target network from the Q network was used to reduce training diversity and contribute to the stability of learning. If it is not properly updated, the overall update rate is reduced to mitigate this problem. Simply slowing down is not recommended because it reduces the speed of the decaying learning rate. Some studies have been conducted to improve the issues with the t-soft update based on the Student’s-t distribution or a method that does not use the target-network. However, there are certain situations in which using the Student’s-t distribution might fail or force it to use more hyperparameters. A few studies have used MI in deep neural networks to improve the decaying learning rate and directly update the target-network by replaying experiences. Therefore, in this study, the MI and reward provided in the experience replay of DRL are combined to improve both the decaying learning rate and the target-network updating. Utilizing rewards is appropriate for use in environments with intrinsic symmetry. It has been confirmed in various OpenAI gymnasiums that stable learning is possible while maintaining an improvement in the decaying learning rate. |
first_indexed | 2024-03-10T20:51:30Z |
format | Article |
id | doaj.art-1df942bae2f64bf1b28af990a60254c4 |
institution | Directory Open Access Journal |
issn | 2073-8994 |
language | English |
last_indexed | 2024-03-10T20:51:30Z |
publishDate | 2023-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Symmetry |
spelling | doaj.art-1df942bae2f64bf1b28af990a60254c42023-11-19T18:17:32ZengMDPI AGSymmetry2073-89942023-09-011510184010.3390/sym15101840Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement LearningChayoung Kim0College of Liberal Arts and Interdisciplinary Studies, Kyonggi University, 154-42 Gwanggyosan-ro, Yeongtong-gu, Suwon-si 16227, Gyeonggi-do, Republic of KoreaIn this study, a target-network update of deep reinforcement learning (DRL) based on mutual information (MI) and rewards is proposed. In DRL, updating the target network from the Q network was used to reduce training diversity and contribute to the stability of learning. If it is not properly updated, the overall update rate is reduced to mitigate this problem. Simply slowing down is not recommended because it reduces the speed of the decaying learning rate. Some studies have been conducted to improve the issues with the t-soft update based on the Student’s-t distribution or a method that does not use the target-network. However, there are certain situations in which using the Student’s-t distribution might fail or force it to use more hyperparameters. A few studies have used MI in deep neural networks to improve the decaying learning rate and directly update the target-network by replaying experiences. Therefore, in this study, the MI and reward provided in the experience replay of DRL are combined to improve both the decaying learning rate and the target-network updating. Utilizing rewards is appropriate for use in environments with intrinsic symmetry. It has been confirmed in various OpenAI gymnasiums that stable learning is possible while maintaining an improvement in the decaying learning rate.https://www.mdpi.com/2073-8994/15/10/1840deep reinforcement learningtarget-networkdeep neural networksmutual informationlearning rate |
spellingShingle | Chayoung Kim Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning Symmetry deep reinforcement learning target-network deep neural networks mutual information learning rate |
title | Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning |
title_full | Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning |
title_fullStr | Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning |
title_full_unstemmed | Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning |
title_short | Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning |
title_sort | target network update linked with learning rate decay based on mutual information and reward in deep reinforcement learning |
topic | deep reinforcement learning target-network deep neural networks mutual information learning rate |
url | https://www.mdpi.com/2073-8994/15/10/1840 |
work_keys_str_mv | AT chayoungkim targetnetworkupdatelinkedwithlearningratedecaybasedonmutualinformationandrewardindeepreinforcementlearning |