Target-Network Update Linked with Learning Rate Decay Based on Mutual Information and Reward in Deep Reinforcement Learning
In this study, a target-network update of deep reinforcement learning (DRL) based on mutual information (MI) and rewards is proposed. In DRL, updating the target network from the Q network was used to reduce training diversity and contribute to the stability of learning. If it is not properly update...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-09-01
|
Series: | Symmetry |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-8994/15/10/1840 |