Reward bonuses with gain scheduling inspired by iterative deepening search

This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, this paper points out that the intrinsic bonuses can be analogous to either of the dep...

Full description

Bibliographic Details
Main Author:	Taisuke Kobayashi
Format:	Article
Language:	English
Published:	Elsevier 2023-09-01
Series:	Results in Control and Optimization
Subjects:	Reinforcement learning Intrinsic reward Value disagreement Self-imitation Iterative deepening search
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666720723000462

_version_	1827929188143726592
author	Taisuke Kobayashi
author_facet	Taisuke Kobayashi
author_sort	Taisuke Kobayashi
collection	DOAJ
description	This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, this paper points out that the intrinsic bonuses can be analogous to either of the depth-first or breadth-first search algorithms in graph theory. Since these two have their own strengths and weaknesses, new bonuses that can bring out the respective characteristics are first proposed. The designed bonuses are derived as differences in key indicators that converge in the steady state based on the concepts of value disagreement and self-imitation. Then, a heuristic gain scheduling is applied to the designed bonuses, inspired by the iterative deepening search, which is known to inherit the advantages of the two search algorithms. The proposed method is expected to allow agent to efficiently reach the best solution in deeper states by gradually exploring unknown states. In three locomotion tasks with dense rewards and three simple tasks with sparse rewards, it is shown that the two types of bonus contribute to the performance improvement of the different tasks complementarily. In addition, by combining them with the proposed gain scheduling, all tasks can be accomplished with high performance.
first_indexed	2024-03-13T06:19:13Z
format	Article
id	doaj.art-f859ccf8198d4806a8352cc3ec438a89
institution	Directory Open Access Journal
issn	2666-7207
language	English
last_indexed	2024-03-13T06:19:13Z
publishDate	2023-09-01
publisher	Elsevier
record_format	Article
series	Results in Control and Optimization
spelling	doaj.art-f859ccf8198d4806a8352cc3ec438a892023-06-10T04:28:41ZengElsevierResults in Control and Optimization2666-72072023-09-0112100244Reward bonuses with gain scheduling inspired by iterative deepening searchTaisuke Kobayashi0Principles of Informatics Research Division, National Institute of Informatics, Tokyo, Japan; School of Multidisciplinary Sciences, Department of Informatics, The Graduate University for Advanced Studies (SOKENDAI), Kanagawa, JapanThis paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, this paper points out that the intrinsic bonuses can be analogous to either of the depth-first or breadth-first search algorithms in graph theory. Since these two have their own strengths and weaknesses, new bonuses that can bring out the respective characteristics are first proposed. The designed bonuses are derived as differences in key indicators that converge in the steady state based on the concepts of value disagreement and self-imitation. Then, a heuristic gain scheduling is applied to the designed bonuses, inspired by the iterative deepening search, which is known to inherit the advantages of the two search algorithms. The proposed method is expected to allow agent to efficiently reach the best solution in deeper states by gradually exploring unknown states. In three locomotion tasks with dense rewards and three simple tasks with sparse rewards, it is shown that the two types of bonus contribute to the performance improvement of the different tasks complementarily. In addition, by combining them with the proposed gain scheduling, all tasks can be accomplished with high performance.http://www.sciencedirect.com/science/article/pii/S2666720723000462Reinforcement learningIntrinsic rewardValue disagreementSelf-imitationIterative deepening search
spellingShingle	Taisuke Kobayashi Reward bonuses with gain scheduling inspired by iterative deepening search Results in Control and Optimization Reinforcement learning Intrinsic reward Value disagreement Self-imitation Iterative deepening search
title	Reward bonuses with gain scheduling inspired by iterative deepening search
title_full	Reward bonuses with gain scheduling inspired by iterative deepening search
title_fullStr	Reward bonuses with gain scheduling inspired by iterative deepening search
title_full_unstemmed	Reward bonuses with gain scheduling inspired by iterative deepening search
title_short	Reward bonuses with gain scheduling inspired by iterative deepening search
title_sort	reward bonuses with gain scheduling inspired by iterative deepening search
topic	Reinforcement learning Intrinsic reward Value disagreement Self-imitation Iterative deepening search
url	http://www.sciencedirect.com/science/article/pii/S2666720723000462
work_keys_str_mv	AT taisukekobayashi rewardbonuseswithgainschedulinginspiredbyiterativedeepeningsearch

Reward bonuses with gain scheduling inspired by iterative deepening search

Similar Items