Implementing action mask in proximal policy optimization (PPO) algorithm

The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the ori...

Full description

Bibliographic Details
Main Authors: Cheng-Yen Tang, Chien-Hung Liu, Woei-Kae Chen, Shingchern D. You
Format: Article
Language:English
Published: Elsevier 2020-09-01
Series:ICT Express
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405959520300746
_version_ 1818506371389194240
author Cheng-Yen Tang
Chien-Hung Liu
Woei-Kae Chen
Shingchern D. You
author_facet Cheng-Yen Tang
Chien-Hung Liu
Woei-Kae Chen
Shingchern D. You
author_sort Cheng-Yen Tang
collection DOAJ
description The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable.
first_indexed 2024-12-10T22:03:39Z
format Article
id doaj.art-edcb3ff262c044528417df65f8899e22
institution Directory Open Access Journal
issn 2405-9595
language English
last_indexed 2024-12-10T22:03:39Z
publishDate 2020-09-01
publisher Elsevier
record_format Article
series ICT Express
spelling doaj.art-edcb3ff262c044528417df65f8899e222022-12-22T01:31:50ZengElsevierICT Express2405-95952020-09-0163200203Implementing action mask in proximal policy optimization (PPO) algorithmCheng-Yen Tang0Chien-Hung Liu1Woei-Kae Chen2Shingchern D. You3Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, TaiwanCorresponding author.; Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, TaiwanThe proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable.http://www.sciencedirect.com/science/article/pii/S2405959520300746PPOInvalid actionReinforcement learning
spellingShingle Cheng-Yen Tang
Chien-Hung Liu
Woei-Kae Chen
Shingchern D. You
Implementing action mask in proximal policy optimization (PPO) algorithm
ICT Express
PPO
Invalid action
Reinforcement learning
title Implementing action mask in proximal policy optimization (PPO) algorithm
title_full Implementing action mask in proximal policy optimization (PPO) algorithm
title_fullStr Implementing action mask in proximal policy optimization (PPO) algorithm
title_full_unstemmed Implementing action mask in proximal policy optimization (PPO) algorithm
title_short Implementing action mask in proximal policy optimization (PPO) algorithm
title_sort implementing action mask in proximal policy optimization ppo algorithm
topic PPO
Invalid action
Reinforcement learning
url http://www.sciencedirect.com/science/article/pii/S2405959520300746
work_keys_str_mv AT chengyentang implementingactionmaskinproximalpolicyoptimizationppoalgorithm
AT chienhungliu implementingactionmaskinproximalpolicyoptimizationppoalgorithm
AT woeikaechen implementingactionmaskinproximalpolicyoptimizationppoalgorithm
AT shingcherndyou implementingactionmaskinproximalpolicyoptimizationppoalgorithm