Reward bonuses with gain scheduling inspired by iterative deepening search

This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, this paper points out that the intrinsic bonuses can be analogous to either of the dep...

Full description

Bibliographic Details
Main Author: Taisuke Kobayashi
Format: Article
Language:English
Published: Elsevier 2023-09-01
Series:Results in Control and Optimization
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666720723000462
_version_ 1797807215226126336
author Taisuke Kobayashi
author_facet Taisuke Kobayashi
author_sort Taisuke Kobayashi
collection DOAJ
description This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, this paper points out that the intrinsic bonuses can be analogous to either of the depth-first or breadth-first search algorithms in graph theory. Since these two have their own strengths and weaknesses, new bonuses that can bring out the respective characteristics are first proposed. The designed bonuses are derived as differences in key indicators that converge in the steady state based on the concepts of value disagreement and self-imitation. Then, a heuristic gain scheduling is applied to the designed bonuses, inspired by the iterative deepening search, which is known to inherit the advantages of the two search algorithms. The proposed method is expected to allow agent to efficiently reach the best solution in deeper states by gradually exploring unknown states. In three locomotion tasks with dense rewards and three simple tasks with sparse rewards, it is shown that the two types of bonus contribute to the performance improvement of the different tasks complementarily. In addition, by combining them with the proposed gain scheduling, all tasks can be accomplished with high performance.
first_indexed 2024-03-13T06:19:13Z
format Article
id doaj.art-f859ccf8198d4806a8352cc3ec438a89
institution Directory Open Access Journal
issn 2666-7207
language English
last_indexed 2024-03-13T06:19:13Z
publishDate 2023-09-01
publisher Elsevier
record_format Article
series Results in Control and Optimization
spelling doaj.art-f859ccf8198d4806a8352cc3ec438a892023-06-10T04:28:41ZengElsevierResults in Control and Optimization2666-72072023-09-0112100244Reward bonuses with gain scheduling inspired by iterative deepening searchTaisuke Kobayashi0Principles of Informatics Research Division, National Institute of Informatics, Tokyo, Japan; School of Multidisciplinary Sciences, Department of Informatics, The Graduate University for Advanced Studies (SOKENDAI), Kanagawa, JapanThis paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, this paper points out that the intrinsic bonuses can be analogous to either of the depth-first or breadth-first search algorithms in graph theory. Since these two have their own strengths and weaknesses, new bonuses that can bring out the respective characteristics are first proposed. The designed bonuses are derived as differences in key indicators that converge in the steady state based on the concepts of value disagreement and self-imitation. Then, a heuristic gain scheduling is applied to the designed bonuses, inspired by the iterative deepening search, which is known to inherit the advantages of the two search algorithms. The proposed method is expected to allow agent to efficiently reach the best solution in deeper states by gradually exploring unknown states. In three locomotion tasks with dense rewards and three simple tasks with sparse rewards, it is shown that the two types of bonus contribute to the performance improvement of the different tasks complementarily. In addition, by combining them with the proposed gain scheduling, all tasks can be accomplished with high performance.http://www.sciencedirect.com/science/article/pii/S2666720723000462Reinforcement learningIntrinsic rewardValue disagreementSelf-imitationIterative deepening search
spellingShingle Taisuke Kobayashi
Reward bonuses with gain scheduling inspired by iterative deepening search
Results in Control and Optimization
Reinforcement learning
Intrinsic reward
Value disagreement
Self-imitation
Iterative deepening search
title Reward bonuses with gain scheduling inspired by iterative deepening search
title_full Reward bonuses with gain scheduling inspired by iterative deepening search
title_fullStr Reward bonuses with gain scheduling inspired by iterative deepening search
title_full_unstemmed Reward bonuses with gain scheduling inspired by iterative deepening search
title_short Reward bonuses with gain scheduling inspired by iterative deepening search
title_sort reward bonuses with gain scheduling inspired by iterative deepening search
topic Reinforcement learning
Intrinsic reward
Value disagreement
Self-imitation
Iterative deepening search
url http://www.sciencedirect.com/science/article/pii/S2666720723000462
work_keys_str_mv AT taisukekobayashi rewardbonuseswithgainschedulinginspiredbyiterativedeepeningsearch