Compound Heuristic Information Guided Policy Improvement for Robot Motor Skill Acquisition

Discovering the implicit pattern and using it as heuristic information to guide the policy search is one of the core factors to speed up the procedure of robot motor skill acquisition. This paper proposes a compound heuristic information guided reinforcement learning algorithm PI<inline-formula&g...

Full description

Bibliographic Details
Main Authors: Jian Fu, Cong Li, Xiang Teng, Fan Luo, Boqun Li
Format: Article
Language:English
Published: MDPI AG 2020-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/15/5346
Description
Summary:Discovering the implicit pattern and using it as heuristic information to guide the policy search is one of the core factors to speed up the procedure of robot motor skill acquisition. This paper proposes a compound heuristic information guided reinforcement learning algorithm PI<inline-formula><math display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>-CMA-KCCA for policy improvement. Its structure and workflow are similar to a double closed-loop control system. The outer loop realized by Kernel Canonical Correlation Analysis (KCCA) infers the implicit nonlinear heuristic information between the joints of the robot. In addition, the inner loop operated by Covariance Matrix Adaptation (CMA) discovers the hidden linear correlations between the basis functions within the joint of the robot. These patterns which are good for learning the new task can automatically determine the mean and variance of the exploring perturbation for Path Integral Policy Improvement (PI<inline-formula><math display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>). Compared with classical PI<sup>2</sup>, PI<sup>2</sup>-CMA, and PI<sup>2</sup>-KCCA, PI<sup>2</sup>-CMA-KCCA can not only endow the robot with the ability to realize transfer learning of trajectory planning from the demonstration to the new task, but also complete it more efficiently. The classical via-point experiments based on SCARA and Swayer robots have validated that the proposed method has fast learning convergence and can find a solution for the new task.
ISSN:2076-3417