Bidirectional Model-Based Policy Optimization Based on Adaptive Gaussian Noise and Improved Confidence Weights

Model-Based Reinforcement Learning (MBRL) has been gradually applied in the field of Robot Learning due to its excellent sample efficiency and asymptotic performance. However, for high-dimensional learning tasks in complex scenes, the exploration and stable training capabilities of the robot still n...

Full description

Bibliographic Details
Main Authors:	Wei Liu, Mengyuan Liu, Bao Jin, Yixin Zhu, Qi Gao, Jiayang Sun
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Model-based reinforcement learning Gaussian noise confidence weight
Online Access:	https://ieeexplore.ieee.org/document/10225738/

_version_	1797733228613730304
author	Wei Liu Mengyuan Liu Bao Jin Yixin Zhu Qi Gao Jiayang Sun
author_facet	Wei Liu Mengyuan Liu Bao Jin Yixin Zhu Qi Gao Jiayang Sun
author_sort	Wei Liu
collection	DOAJ
description	Model-Based Reinforcement Learning (MBRL) has been gradually applied in the field of Robot Learning due to its excellent sample efficiency and asymptotic performance. However, for high-dimensional learning tasks in complex scenes, the exploration and stable training capabilities of the robot still need enhancement. In light of policy planning and policy optimization, we propose a bidirectional model-based policy optimization algorithm based on adaptive gaussian noise and improved confidence weights (BMPO-NW). The algorithm parameterizes bidirectional policy networks into noise networks by adding different adaptive Gaussian noises to the connection weights and biases. This can improve the randomness of policy search and induce efficient exploration for the robot. Simultaneously, the confidence weight of improved activation function is introduced into the Q-function update formula of SAC, which can reduce the error propagation problem of target Q-network, and enhance the robot’s training stability. Finally, we implement the improved algorithm based on the framework of bidirectional model-based policy optimization algorithm (BMPO) to ensure asymptotic performance and sample efficiency. Experimental results in MuJoCo benchmark environments demonstrate that the learning speed of BMPO-NW is about 20% higher than baseline methods, the average reward is about 15% higher than other MBRL methods, and 50%-70% higher than MFRL methods, while the training process is more stable. Ablation experiments and different variant design experiments further verify the feasibility and robustness. The research results provide theoretical support for the conclusion of this paper and hold significant practical value for MBRL to help the robot realize applications in complex scenarios.
first_indexed	2024-03-12T12:25:52Z
format	Article
id	doaj.art-cf5a0032126e4de587ec23c2a9175c4f
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-12T12:25:52Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-cf5a0032126e4de587ec23c2a9175c4f2023-08-29T23:00:22ZengIEEEIEEE Access2169-35362023-01-0111902549026810.1109/ACCESS.2023.330757310225738Bidirectional Model-Based Policy Optimization Based on Adaptive Gaussian Noise and Improved Confidence WeightsWei Liu0https://orcid.org/0000-0001-5821-9265Mengyuan Liu1https://orcid.org/0009-0008-7792-9092Bao Jin2https://orcid.org/0009-0002-4120-1485Yixin Zhu3Qi Gao4https://orcid.org/0009-0008-2361-5573Jiayang Sun5https://orcid.org/0000-0002-5343-4991College of Science, Liaoning Technical University, Fuxin, ChinaCollege of Science, Liaoning Technical University, Fuxin, ChinaInstitute of Mathematics and Systems Science, Liaoning Technical University, Fuxin, ChinaCollege of Science, Liaoning Technical University, Fuxin, ChinaCollege of Science, Liaoning Technical University, Fuxin, ChinaCollege of Science, Liaoning Technical University, Fuxin, ChinaModel-Based Reinforcement Learning (MBRL) has been gradually applied in the field of Robot Learning due to its excellent sample efficiency and asymptotic performance. However, for high-dimensional learning tasks in complex scenes, the exploration and stable training capabilities of the robot still need enhancement. In light of policy planning and policy optimization, we propose a bidirectional model-based policy optimization algorithm based on adaptive gaussian noise and improved confidence weights (BMPO-NW). The algorithm parameterizes bidirectional policy networks into noise networks by adding different adaptive Gaussian noises to the connection weights and biases. This can improve the randomness of policy search and induce efficient exploration for the robot. Simultaneously, the confidence weight of improved activation function is introduced into the Q-function update formula of SAC, which can reduce the error propagation problem of target Q-network, and enhance the robot’s training stability. Finally, we implement the improved algorithm based on the framework of bidirectional model-based policy optimization algorithm (BMPO) to ensure asymptotic performance and sample efficiency. Experimental results in MuJoCo benchmark environments demonstrate that the learning speed of BMPO-NW is about 20% higher than baseline methods, the average reward is about 15% higher than other MBRL methods, and 50%-70% higher than MFRL methods, while the training process is more stable. Ablation experiments and different variant design experiments further verify the feasibility and robustness. The research results provide theoretical support for the conclusion of this paper and hold significant practical value for MBRL to help the robot realize applications in complex scenarios.https://ieeexplore.ieee.org/document/10225738/Model-based reinforcement learningGaussian noiseconfidence weight
spellingShingle	Wei Liu Mengyuan Liu Bao Jin Yixin Zhu Qi Gao Jiayang Sun Bidirectional Model-Based Policy Optimization Based on Adaptive Gaussian Noise and Improved Confidence Weights IEEE Access Model-based reinforcement learning Gaussian noise confidence weight
title	Bidirectional Model-Based Policy Optimization Based on Adaptive Gaussian Noise and Improved Confidence Weights
title_full	Bidirectional Model-Based Policy Optimization Based on Adaptive Gaussian Noise and Improved Confidence Weights
title_fullStr	Bidirectional Model-Based Policy Optimization Based on Adaptive Gaussian Noise and Improved Confidence Weights
title_full_unstemmed	Bidirectional Model-Based Policy Optimization Based on Adaptive Gaussian Noise and Improved Confidence Weights
title_short	Bidirectional Model-Based Policy Optimization Based on Adaptive Gaussian Noise and Improved Confidence Weights
title_sort	bidirectional model based policy optimization based on adaptive gaussian noise and improved confidence weights
topic	Model-based reinforcement learning Gaussian noise confidence weight
url	https://ieeexplore.ieee.org/document/10225738/
work_keys_str_mv	AT weiliu bidirectionalmodelbasedpolicyoptimizationbasedonadaptivegaussiannoiseandimprovedconfidenceweights AT mengyuanliu bidirectionalmodelbasedpolicyoptimizationbasedonadaptivegaussiannoiseandimprovedconfidenceweights AT baojin bidirectionalmodelbasedpolicyoptimizationbasedonadaptivegaussiannoiseandimprovedconfidenceweights AT yixinzhu bidirectionalmodelbasedpolicyoptimizationbasedonadaptivegaussiannoiseandimprovedconfidenceweights AT qigao bidirectionalmodelbasedpolicyoptimizationbasedonadaptivegaussiannoiseandimprovedconfidenceweights AT jiayangsun bidirectionalmodelbasedpolicyoptimizationbasedonadaptivegaussiannoiseandimprovedconfidenceweights

Bidirectional Model-Based Policy Optimization Based on Adaptive Gaussian Noise and Improved Confidence Weights

Similar Items