Reward bonuses with gain scheduling inspired by iterative deepening search
This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, this paper points out that the intrinsic bonuses can be analogous to either of the dep...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-09-01
|
Series: | Results in Control and Optimization |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666720723000462 |
_version_ | 1797807215226126336 |
---|---|
author | Taisuke Kobayashi |
author_facet | Taisuke Kobayashi |
author_sort | Taisuke Kobayashi |
collection | DOAJ |
description | This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, this paper points out that the intrinsic bonuses can be analogous to either of the depth-first or breadth-first search algorithms in graph theory. Since these two have their own strengths and weaknesses, new bonuses that can bring out the respective characteristics are first proposed. The designed bonuses are derived as differences in key indicators that converge in the steady state based on the concepts of value disagreement and self-imitation. Then, a heuristic gain scheduling is applied to the designed bonuses, inspired by the iterative deepening search, which is known to inherit the advantages of the two search algorithms. The proposed method is expected to allow agent to efficiently reach the best solution in deeper states by gradually exploring unknown states. In three locomotion tasks with dense rewards and three simple tasks with sparse rewards, it is shown that the two types of bonus contribute to the performance improvement of the different tasks complementarily. In addition, by combining them with the proposed gain scheduling, all tasks can be accomplished with high performance. |
first_indexed | 2024-03-13T06:19:13Z |
format | Article |
id | doaj.art-f859ccf8198d4806a8352cc3ec438a89 |
institution | Directory Open Access Journal |
issn | 2666-7207 |
language | English |
last_indexed | 2024-03-13T06:19:13Z |
publishDate | 2023-09-01 |
publisher | Elsevier |
record_format | Article |
series | Results in Control and Optimization |
spelling | doaj.art-f859ccf8198d4806a8352cc3ec438a892023-06-10T04:28:41ZengElsevierResults in Control and Optimization2666-72072023-09-0112100244Reward bonuses with gain scheduling inspired by iterative deepening searchTaisuke Kobayashi0Principles of Informatics Research Division, National Institute of Informatics, Tokyo, Japan; School of Multidisciplinary Sciences, Department of Informatics, The Graduate University for Advanced Studies (SOKENDAI), Kanagawa, JapanThis paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, this paper points out that the intrinsic bonuses can be analogous to either of the depth-first or breadth-first search algorithms in graph theory. Since these two have their own strengths and weaknesses, new bonuses that can bring out the respective characteristics are first proposed. The designed bonuses are derived as differences in key indicators that converge in the steady state based on the concepts of value disagreement and self-imitation. Then, a heuristic gain scheduling is applied to the designed bonuses, inspired by the iterative deepening search, which is known to inherit the advantages of the two search algorithms. The proposed method is expected to allow agent to efficiently reach the best solution in deeper states by gradually exploring unknown states. In three locomotion tasks with dense rewards and three simple tasks with sparse rewards, it is shown that the two types of bonus contribute to the performance improvement of the different tasks complementarily. In addition, by combining them with the proposed gain scheduling, all tasks can be accomplished with high performance.http://www.sciencedirect.com/science/article/pii/S2666720723000462Reinforcement learningIntrinsic rewardValue disagreementSelf-imitationIterative deepening search |
spellingShingle | Taisuke Kobayashi Reward bonuses with gain scheduling inspired by iterative deepening search Results in Control and Optimization Reinforcement learning Intrinsic reward Value disagreement Self-imitation Iterative deepening search |
title | Reward bonuses with gain scheduling inspired by iterative deepening search |
title_full | Reward bonuses with gain scheduling inspired by iterative deepening search |
title_fullStr | Reward bonuses with gain scheduling inspired by iterative deepening search |
title_full_unstemmed | Reward bonuses with gain scheduling inspired by iterative deepening search |
title_short | Reward bonuses with gain scheduling inspired by iterative deepening search |
title_sort | reward bonuses with gain scheduling inspired by iterative deepening search |
topic | Reinforcement learning Intrinsic reward Value disagreement Self-imitation Iterative deepening search |
url | http://www.sciencedirect.com/science/article/pii/S2666720723000462 |
work_keys_str_mv | AT taisukekobayashi rewardbonuseswithgainschedulinginspiredbyiterativedeepeningsearch |