Implementing action mask in proximal policy optimization (PPO) algorithm
The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the ori...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2020-09-01
|
Series: | ICT Express |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2405959520300746 |
_version_ | 1818506371389194240 |
---|---|
author | Cheng-Yen Tang Chien-Hung Liu Woei-Kae Chen Shingchern D. You |
author_facet | Cheng-Yen Tang Chien-Hung Liu Woei-Kae Chen Shingchern D. You |
author_sort | Cheng-Yen Tang |
collection | DOAJ |
description | The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable. |
first_indexed | 2024-12-10T22:03:39Z |
format | Article |
id | doaj.art-edcb3ff262c044528417df65f8899e22 |
institution | Directory Open Access Journal |
issn | 2405-9595 |
language | English |
last_indexed | 2024-12-10T22:03:39Z |
publishDate | 2020-09-01 |
publisher | Elsevier |
record_format | Article |
series | ICT Express |
spelling | doaj.art-edcb3ff262c044528417df65f8899e222022-12-22T01:31:50ZengElsevierICT Express2405-95952020-09-0163200203Implementing action mask in proximal policy optimization (PPO) algorithmCheng-Yen Tang0Chien-Hung Liu1Woei-Kae Chen2Shingchern D. You3Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, TaiwanCorresponding author.; Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, TaiwanThe proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable.http://www.sciencedirect.com/science/article/pii/S2405959520300746PPOInvalid actionReinforcement learning |
spellingShingle | Cheng-Yen Tang Chien-Hung Liu Woei-Kae Chen Shingchern D. You Implementing action mask in proximal policy optimization (PPO) algorithm ICT Express PPO Invalid action Reinforcement learning |
title | Implementing action mask in proximal policy optimization (PPO) algorithm |
title_full | Implementing action mask in proximal policy optimization (PPO) algorithm |
title_fullStr | Implementing action mask in proximal policy optimization (PPO) algorithm |
title_full_unstemmed | Implementing action mask in proximal policy optimization (PPO) algorithm |
title_short | Implementing action mask in proximal policy optimization (PPO) algorithm |
title_sort | implementing action mask in proximal policy optimization ppo algorithm |
topic | PPO Invalid action Reinforcement learning |
url | http://www.sciencedirect.com/science/article/pii/S2405959520300746 |
work_keys_str_mv | AT chengyentang implementingactionmaskinproximalpolicyoptimizationppoalgorithm AT chienhungliu implementingactionmaskinproximalpolicyoptimizationppoalgorithm AT woeikaechen implementingactionmaskinproximalpolicyoptimizationppoalgorithm AT shingcherndyou implementingactionmaskinproximalpolicyoptimizationppoalgorithm |