A Functional Clipping Approach for Policy Optimization Algorithms

Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each policy update. Although such restriction is helpful, the algori...

Full description

Bibliographic Details
Main Authors: Wangshu Zhu, Andre Rosendo
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9474478/
_version_ 1811274083635036160
author Wangshu Zhu
Andre Rosendo
author_facet Wangshu Zhu
Andre Rosendo
author_sort Wangshu Zhu
collection DOAJ
description Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each policy update. Although such restriction is helpful, the algorithm still suffers from performance instability and optimization inefficiency from the sudden flattening of the curve. To address this issue we present a novel functional clipping policy optimization algorithm, named Proximal Policy Optimization Smoothed Algorithm (PPOS), and its critical improvement is the use of a functional clipping method instead of a flat clipping method. We compare our approach with PPO and PPORB, which adopts a rollback clipping method, and prove that our approach can conduct more accurate updates than other PPO methods. We show that it outperforms the latest PPO variants on both performance and stability in challenging continuous control tasks. Moreover, we provide an instructive guideline for tuning the main hyperparameter in our algorithm.
first_indexed 2024-04-12T23:12:30Z
format Article
id doaj.art-38f20ef12085440b9d67d25543fc3b14
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-12T23:12:30Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-38f20ef12085440b9d67d25543fc3b142022-12-22T03:12:46ZengIEEEIEEE Access2169-35362021-01-019960569606310.1109/ACCESS.2021.30945669474478A Functional Clipping Approach for Policy Optimization AlgorithmsWangshu Zhu0https://orcid.org/0000-0003-1950-202XAndre Rosendo1https://orcid.org/0000-0003-4062-5390Department of Computer Science, School of Information Science and Technology, Living Machines Laboratories, ShanghaiTech University, Shanghai, ChinaDepartment of Computer Science, School of Information Science and Technology, Living Machines Laboratories, ShanghaiTech University, Shanghai, ChinaProximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each policy update. Although such restriction is helpful, the algorithm still suffers from performance instability and optimization inefficiency from the sudden flattening of the curve. To address this issue we present a novel functional clipping policy optimization algorithm, named Proximal Policy Optimization Smoothed Algorithm (PPOS), and its critical improvement is the use of a functional clipping method instead of a flat clipping method. We compare our approach with PPO and PPORB, which adopts a rollback clipping method, and prove that our approach can conduct more accurate updates than other PPO methods. We show that it outperforms the latest PPO variants on both performance and stability in challenging continuous control tasks. Moreover, we provide an instructive guideline for tuning the main hyperparameter in our algorithm.https://ieeexplore.ieee.org/document/9474478/Machine learningrobot controldeep reinforcement learningpolicy search algorithm
spellingShingle Wangshu Zhu
Andre Rosendo
A Functional Clipping Approach for Policy Optimization Algorithms
IEEE Access
Machine learning
robot control
deep reinforcement learning
policy search algorithm
title A Functional Clipping Approach for Policy Optimization Algorithms
title_full A Functional Clipping Approach for Policy Optimization Algorithms
title_fullStr A Functional Clipping Approach for Policy Optimization Algorithms
title_full_unstemmed A Functional Clipping Approach for Policy Optimization Algorithms
title_short A Functional Clipping Approach for Policy Optimization Algorithms
title_sort functional clipping approach for policy optimization algorithms
topic Machine learning
robot control
deep reinforcement learning
policy search algorithm
url https://ieeexplore.ieee.org/document/9474478/
work_keys_str_mv AT wangshuzhu afunctionalclippingapproachforpolicyoptimizationalgorithms
AT andrerosendo afunctionalclippingapproachforpolicyoptimizationalgorithms
AT wangshuzhu functionalclippingapproachforpolicyoptimizationalgorithms
AT andrerosendo functionalclippingapproachforpolicyoptimizationalgorithms