Reward bonuses with gain scheduling inspired by iterative deepening search

This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, this paper points out that the intrinsic bonuses can be analogous to either of the dep...

Full description

Bibliographic Details
Main Author: Taisuke Kobayashi
Format: Article
Language:English
Published: Elsevier 2023-09-01
Series:Results in Control and Optimization
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666720723000462
Description
Summary:This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, this paper points out that the intrinsic bonuses can be analogous to either of the depth-first or breadth-first search algorithms in graph theory. Since these two have their own strengths and weaknesses, new bonuses that can bring out the respective characteristics are first proposed. The designed bonuses are derived as differences in key indicators that converge in the steady state based on the concepts of value disagreement and self-imitation. Then, a heuristic gain scheduling is applied to the designed bonuses, inspired by the iterative deepening search, which is known to inherit the advantages of the two search algorithms. The proposed method is expected to allow agent to efficiently reach the best solution in deeper states by gradually exploring unknown states. In three locomotion tasks with dense rewards and three simple tasks with sparse rewards, it is shown that the two types of bonus contribute to the performance improvement of the different tasks complementarily. In addition, by combining them with the proposed gain scheduling, all tasks can be accomplished with high performance.
ISSN:2666-7207