A Functional Clipping Approach for Policy Optimization Algorithms
Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each policy update. Although such restriction is helpful, the algori...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9474478/ |
_version_ | 1811274083635036160 |
---|---|
author | Wangshu Zhu Andre Rosendo |
author_facet | Wangshu Zhu Andre Rosendo |
author_sort | Wangshu Zhu |
collection | DOAJ |
description | Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each policy update. Although such restriction is helpful, the algorithm still suffers from performance instability and optimization inefficiency from the sudden flattening of the curve. To address this issue we present a novel functional clipping policy optimization algorithm, named Proximal Policy Optimization Smoothed Algorithm (PPOS), and its critical improvement is the use of a functional clipping method instead of a flat clipping method. We compare our approach with PPO and PPORB, which adopts a rollback clipping method, and prove that our approach can conduct more accurate updates than other PPO methods. We show that it outperforms the latest PPO variants on both performance and stability in challenging continuous control tasks. Moreover, we provide an instructive guideline for tuning the main hyperparameter in our algorithm. |
first_indexed | 2024-04-12T23:12:30Z |
format | Article |
id | doaj.art-38f20ef12085440b9d67d25543fc3b14 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-12T23:12:30Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-38f20ef12085440b9d67d25543fc3b142022-12-22T03:12:46ZengIEEEIEEE Access2169-35362021-01-019960569606310.1109/ACCESS.2021.30945669474478A Functional Clipping Approach for Policy Optimization AlgorithmsWangshu Zhu0https://orcid.org/0000-0003-1950-202XAndre Rosendo1https://orcid.org/0000-0003-4062-5390Department of Computer Science, School of Information Science and Technology, Living Machines Laboratories, ShanghaiTech University, Shanghai, ChinaDepartment of Computer Science, School of Information Science and Technology, Living Machines Laboratories, ShanghaiTech University, Shanghai, ChinaProximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each policy update. Although such restriction is helpful, the algorithm still suffers from performance instability and optimization inefficiency from the sudden flattening of the curve. To address this issue we present a novel functional clipping policy optimization algorithm, named Proximal Policy Optimization Smoothed Algorithm (PPOS), and its critical improvement is the use of a functional clipping method instead of a flat clipping method. We compare our approach with PPO and PPORB, which adopts a rollback clipping method, and prove that our approach can conduct more accurate updates than other PPO methods. We show that it outperforms the latest PPO variants on both performance and stability in challenging continuous control tasks. Moreover, we provide an instructive guideline for tuning the main hyperparameter in our algorithm.https://ieeexplore.ieee.org/document/9474478/Machine learningrobot controldeep reinforcement learningpolicy search algorithm |
spellingShingle | Wangshu Zhu Andre Rosendo A Functional Clipping Approach for Policy Optimization Algorithms IEEE Access Machine learning robot control deep reinforcement learning policy search algorithm |
title | A Functional Clipping Approach for Policy Optimization Algorithms |
title_full | A Functional Clipping Approach for Policy Optimization Algorithms |
title_fullStr | A Functional Clipping Approach for Policy Optimization Algorithms |
title_full_unstemmed | A Functional Clipping Approach for Policy Optimization Algorithms |
title_short | A Functional Clipping Approach for Policy Optimization Algorithms |
title_sort | functional clipping approach for policy optimization algorithms |
topic | Machine learning robot control deep reinforcement learning policy search algorithm |
url | https://ieeexplore.ieee.org/document/9474478/ |
work_keys_str_mv | AT wangshuzhu afunctionalclippingapproachforpolicyoptimizationalgorithms AT andrerosendo afunctionalclippingapproachforpolicyoptimizationalgorithms AT wangshuzhu functionalclippingapproachforpolicyoptimizationalgorithms AT andrerosendo functionalclippingapproachforpolicyoptimizationalgorithms |