M-A3C: A Mean-Asynchronous Advantage Actor-Critic Reinforcement Learning Method for Real-Time Gait Planning of Biped Robot

Bipedal walking is a challenging task for humanoid robots. In this study, we develop a lightweight reinforcement learning method for real-time gait planning of the biped robot. We regard bipedal walking as a process in which the robot constantly interacts with the environment, judges the quality of...

Full description

Bibliographic Details
Main Authors: Jie Leng, Suozhong Fan, Jun Tang, Haiming Mou, Junxiao Xue, Qingdu Li
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9779214/
Description
Summary:Bipedal walking is a challenging task for humanoid robots. In this study, we develop a lightweight reinforcement learning method for real-time gait planning of the biped robot. We regard bipedal walking as a process in which the robot constantly interacts with the environment, judges the quality of control action through the walking state, and then adjusts the control strategy. A mean-asynchronous advantage actor-critic (M-A3C) reinforcement learning algorithm is proposed to obtain the continuous state space and action space, and directly obtain the final gait of the robot without introducing the reference gait. We use multiple sub-agents of M-A3C algorithm to train multiple virtual robots independently at the same time in the physical simulation platform. Then we transfer the trained model to the walking control of the actual robot to reduce the number of training on the actual robot, improve the training speed, and ensure the acquisition of the final gait. Finally, a biped robot is designed and fabricated to verify the effectiveness of the proposed method. Various experiments show that the proposed method can achieve the biped robot’s continuous and stable gait planning.
ISSN:2169-3536