Implementing action mask in proximal policy optimization (PPO) algorithm

Implementing action mask in proximal policy optimization (PPO) algorithm

The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the ori...

Full description

Bibliographic Details
Main Authors:	Cheng-Yen Tang, Chien-Hung Liu, Woei-Kae Chen, Shingchern D. You
Format:	Article
Language:	English
Published:	Elsevier 2020-09-01
Series:	ICT Express
Subjects:	PPO Invalid action Reinforcement learning
Online Access:	http://www.sciencedirect.com/science/article/pii/S2405959520300746

Similar Items

Comparison of Empirical and Reinforcement Learning (RL)-Based Control Based on Proximal Policy Optimization (PPO) for Walking Assistance: Does AI Always Win?
by: Nadine Drewing, et al.
Published: (2024-11-01)

Intelligent Smart Marine Autonomous Surface Ship Decision System Based on Improved PPO Algorithm
by: Wei Guan, et al.
Published: (2022-07-01)

Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework
by: Guang Zhan, et al.
Published: (2022-07-01)

A Novel Single-Site Mutation in the Catalytic Domain of Protoporphyrinogen Oxidase IX (PPO) Confers Resistance to PPO-Inhibiting Herbicides
by: Gulab Rangani, et al.
Published: (2019-05-01)

Exploring the Use of Invalid Action Masking in Reinforcement Learning: A Comparative Study of On-Policy and Off-Policy Algorithms in Real-Time Strategy Games
by: Yueqi Hou, et al.
Published: (2023-07-01)

Assessment of Efficacy and Mechanism of Resistance to Soil-Applied PPO Inhibitors in <i>Amaranthus palmeri</i>
by: Gulab Rangani, et al.
Published: (2023-02-01)

Multi-Agent Reinforcement Learning With Action Masking for UAV-Enabled Mobile Communications
by: Danish Rizvi, et al.
Published: (2025-01-01)

An Improved Proximal Policy Optimization Method for Low-Level Control of a Quadrotor
by: Wentao Xue, et al.
Published: (2022-04-01)

EMExplorer: an episodic memory enhanced autonomous exploration strategy with Voronoi domain conversion and invalid action masking
by: Bolei Chen, et al.
Published: (2023-06-01)

Using PPO Models to Predict the Value of the BNB Cryptocurrency
by: Dmitrii V. Firsov, et al.
Published: (2023-07-01)

Securing EHRs With a Novel Token-Based and PPoS Blockchain Methodology
by: Rihab Benaich, et al.
Published: (2024-01-01)

Intelligent interference decision algorithm with prior knowledge embedded LSTM-PPO model
by: ZHANG Jingke, et al.
Published: (2024-12-01)

Design of digital low-carbon system for smart buildings based on PPO algorithm
by: Yaohuan Wu, et al.
Published: (2025-02-01)

Autonomous Air Combat Maneuver Decision-Making Based on PPO-BWDA
by: Hongming Wang, et al.
Published: (2024-01-01)

USV Collision Avoidance Decision-Making Based on the Improved PPO Algorithm in Restricted Waters
by: Shuhui Hao, et al.
Published: (2024-08-01)

Legal nature of the annulment sanction indicated in art. 9 item 1 of the Act on shaping the agricultural system in connection with the amendment to this Act
by: Jakub Jan Zięty, et al.
Published: (2024-03-01)

Optimization of Predefined-Time Agent-Scheduling Strategy Based on PPO
by: Dingding Qi, et al.
Published: (2024-07-01)

Partial purification, characterization and investigation of inhibitory effects of organic compounds on Cinnamomum verum polyphenoloxidase (PPO) enzyme
by: Shruti D Laad, et al.
Published: (2020-07-01)

Novel Thiazole Phenoxypyridine Derivatives Protect Maize from Residual Pesticide Injury Caused by PPO-Inhibitor Fomesafen
by: Li-xia Zhao, et al.
Published: (2019-09-01)

Model-Based Predictive Control and Reinforcement Learning for Planning Vehicle-Parking Trajectories for Vertical Parking Spaces
by: Junren Shi, et al.
Published: (2023-08-01)

Flight Plan Optimisation of Unmanned Aerial Vehicles with Minimised Radar Observability Using Action Shaping Proximal Policy Optimisation
by: Ahmed Moazzam Ali, et al.
Published: (2024-10-01)

Genome-wide analysis of the polyphenol oxidase gene family reveals that MaPPO1 and MaPPO6 are the main contributors to fruit browning in Musa acuminate
by: Fei Qin, et al.
Published: (2023-02-01)

Research on Behavioral Decision at an Unsignalized Roundabout for Automatic Driving Based on Proximal Policy Optimization Algorithm
by: Jingpeng Gan, et al.
Published: (2024-03-01)

Improving traffic signal control operations using proximal policy optimization
by: Liben Huang, et al.
Published: (2023-03-01)

تأثير أنزيم PPO (بولي فينول أوكسيداز) في الخصائص الفيزيائية والكميائية لأنواع الدقيق المنتجة من بعض أصناف القمح الصلب السوري
by: ياسر قرحيلي, et al.
Published: (2024-09-01)

Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices
by: Hyun-Kyo Lim, et al.
Published: (2020-03-01)

Genome-wide investigation and expression profiling of polyphenol oxidase (PPO) family genes uncover likely functions in organ development and stress responses in Populus trichocarpa
by: Fang He, et al.
Published: (2021-10-01)

Channel assignment and power allocation for throughput improvement with PPO in B5G heterogeneous edge networks
by: Xiaoming He, et al.
Published: (2024-02-01)

<i>PPO2</i> Mutations in <i>Amaranthus palmeri</i>: Implications on Cross-Resistance
by: Pâmela Carvalho-Moore, et al.
Published: (2021-08-01)

Experimental and theoretical study of bifunctionalized PEO–PPO–PEO triblock copolymers with applications as dehydrating agents for heavy crude oil
by: César A. Flores-Sandoval, et al.
Published: (2017-03-01)

Object Detection Method Using Image and Number of Objects on Image as Label
by: Keong-Hun Choi, et al.
Published: (2024-01-01)

Health and reason between the movable and the reasonable
by: حيدر عيدان, et al.
Published: (2014-09-01)

A low-carbon optimization scheduling method of CIES based on PPO algorithm
by: CHEN Fan, et al.
Published: (2024-11-01)

Economic Evaluation of Losses From Invalidism of the Population in Russia: Approaches and Methods
by: Olga I. Goleva
Published: (2017-11-01)

Smart City Traffic Flow and Signal Optimization Using STGCN-LSTM and PPO Algorithms
by: Tuxiang Lin, et al.
Published: (2025-01-01)

Void(Invalid) Contract in the Shiite and Sunni Jurisprudence, Iranian and Egyptian Law
by: mohammad hassan ha\'eri, et al.
Published: (2011-02-01)

Law consequences of invalidity in a succession law
by: Василь Іванович Крат
Published: (2016-01-01)

Invalidity of contract: legislative regulation and types
by: Василь Іванович Крат
Published: (2017-09-01)

Optimization of Task-Scheduling Strategy in Edge Kubernetes Clusters Based on Deep Reinforcement Learning
by: Xin Wang, et al.
Published: (2023-10-01)

Intelligent Predetermination of Generator Tripping Scheme: Knowledge Fusion-based Deep Reinforcement Learning Framework
by: Lingkang Zeng, et al.
Published: (2024-01-01)