Learning Potential in Subgoal-Based Reward Shaping

Human knowledge can reduce the number of iterations required to learn in reinforcement learning. Though the most common approach uses trajectories, it is difficult to acquire them in certain domains. Subgoals, which are intermediate states, have been studied instead of trajectories. Subgoal-based re...

Full description

Bibliographic Details
Main Authors:	Takato Okudo, Seiji Yamada
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Reinforcement learning deep reinforcement learning subgoals reward shaping potential-based reward shaping subgoal-based reward shaping
Online Access:	https://ieeexplore.ieee.org/document/10047888/

_version_	1797894917594284032
author	Takato Okudo Seiji Yamada
author_facet	Takato Okudo Seiji Yamada
author_sort	Takato Okudo
collection	DOAJ
description	Human knowledge can reduce the number of iterations required to learn in reinforcement learning. Though the most common approach uses trajectories, it is difficult to acquire them in certain domains. Subgoals, which are intermediate states, have been studied instead of trajectories. Subgoal-based reward shaping is a method that adds rewards to environmental rewards with a sequence of subgoals. The potential function, which is a component of subgoal-based reward shaping, is shaped by a hyperparameter that controls its output. However, it is not easy to select a hyperparameter because its appropriate value depends on the reward function of an environment, and the reward function is unknown but its output is available. We propose learned potential that parameterizes a hyperparameter and acquires its potential through learning. A value is an expected accumulated reward if an agent follows its policy after the current state and is strongly related to the reward function. With learned potential, we build an abstract state space, which is a higher-level representation of the state, with a sequence of subgoals and use the value over the abstract states as the potential to accelerate the value learning. N-step temporal-difference (TD) method learns the values over the abstract state. We conducted experiments to evaluate the effectiveness of learned potential, and the results indicate its effectiveness compared with a baseline reinforcement learning algorithm and several reward-shaping algorithms. The results also indicate that the participants’ subgoals are superior to subgoals generated randomly with learned potential. We discuss the appropriate number of subgoals for learned potential, that partially ordered subgoal is helpful for learned potential, that learned potential cannot make learning efficient in step penalized rewards, and that learned potential is superior to the non-learned potential in mixed positive and negative rewards.
first_indexed	2024-04-10T07:18:09Z
format	Article
id	doaj.art-188bcae17839430abf6a2e41c3bf08b6
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-10T07:18:09Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-188bcae17839430abf6a2e41c3bf08b62023-02-25T00:02:04ZengIEEEIEEE Access2169-35362023-01-0111171161713710.1109/ACCESS.2023.324626710047888Learning Potential in Subgoal-Based Reward ShapingTakato Okudo0https://orcid.org/0000-0002-7218-7842Seiji Yamada1https://orcid.org/0000-0002-5907-7382Department of Informatics, The Graduate University for Advanced Studies (SOKENDAI), Tokyo, JapanDepartment of Informatics, The Graduate University for Advanced Studies (SOKENDAI), Tokyo, JapanHuman knowledge can reduce the number of iterations required to learn in reinforcement learning. Though the most common approach uses trajectories, it is difficult to acquire them in certain domains. Subgoals, which are intermediate states, have been studied instead of trajectories. Subgoal-based reward shaping is a method that adds rewards to environmental rewards with a sequence of subgoals. The potential function, which is a component of subgoal-based reward shaping, is shaped by a hyperparameter that controls its output. However, it is not easy to select a hyperparameter because its appropriate value depends on the reward function of an environment, and the reward function is unknown but its output is available. We propose learned potential that parameterizes a hyperparameter and acquires its potential through learning. A value is an expected accumulated reward if an agent follows its policy after the current state and is strongly related to the reward function. With learned potential, we build an abstract state space, which is a higher-level representation of the state, with a sequence of subgoals and use the value over the abstract states as the potential to accelerate the value learning. N-step temporal-difference (TD) method learns the values over the abstract state. We conducted experiments to evaluate the effectiveness of learned potential, and the results indicate its effectiveness compared with a baseline reinforcement learning algorithm and several reward-shaping algorithms. The results also indicate that the participants’ subgoals are superior to subgoals generated randomly with learned potential. We discuss the appropriate number of subgoals for learned potential, that partially ordered subgoal is helpful for learned potential, that learned potential cannot make learning efficient in step penalized rewards, and that learned potential is superior to the non-learned potential in mixed positive and negative rewards.https://ieeexplore.ieee.org/document/10047888/Reinforcement learningdeep reinforcement learningsubgoalsreward shapingpotential-based reward shapingsubgoal-based reward shaping
spellingShingle	Takato Okudo Seiji Yamada Learning Potential in Subgoal-Based Reward Shaping IEEE Access Reinforcement learning deep reinforcement learning subgoals reward shaping potential-based reward shaping subgoal-based reward shaping
title	Learning Potential in Subgoal-Based Reward Shaping
title_full	Learning Potential in Subgoal-Based Reward Shaping
title_fullStr	Learning Potential in Subgoal-Based Reward Shaping
title_full_unstemmed	Learning Potential in Subgoal-Based Reward Shaping
title_short	Learning Potential in Subgoal-Based Reward Shaping
title_sort	learning potential in subgoal based reward shaping
topic	Reinforcement learning deep reinforcement learning subgoals reward shaping potential-based reward shaping subgoal-based reward shaping
url	https://ieeexplore.ieee.org/document/10047888/
work_keys_str_mv	AT takatookudo learningpotentialinsubgoalbasedrewardshaping AT seijiyamada learningpotentialinsubgoalbasedrewardshaping

Learning Potential in Subgoal-Based Reward Shaping

Similar Items