An Experience Aggregative Reinforcement Learning With Multi-Attribute Decision-Making for Obstacle Avoidance of Wheeled Mobile Robot

A variety of reinforcement learning (RL) methods are developed to achieve the motion control for the robotic systems, which has been a hot issue. However, the performance of the conventional RL methods often encounters a bottleneck, because the robots have difficulty in choosing an appropriate actio...

Full description

Bibliographic Details
Main Authors: Chunyang Hu, Bin Ning, Meng Xu, Qiong Gu
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9112198/
Description
Summary:A variety of reinforcement learning (RL) methods are developed to achieve the motion control for the robotic systems, which has been a hot issue. However, the performance of the conventional RL methods often encounters a bottleneck, because the robots have difficulty in choosing an appropriate action in the control task due to the exploration-exploitation dilemma. To address this problem and improve the learning performance, this work introduces an experience aggregative reinforcement learning method with a Multi-Attribute Decision-Making (MADM) to achieve the real-time obstacle avoidance of wheeled mobile robot (WMR). The proposed method employs an experience aggregation method to cluster experiential samples and it can achieve more effective experience storage. Moreover, to achieve the effective action selection using the prior experience, an action selection policy based on a Multi-Attribute Decision-Making is proposed. Inspired by the hierarchical decision-making, this work decomposes the original obstacle avoidance task into two sub-tasks using a divide-and-conquer approach. Each sub-task is trained individually by a double Q-learning using a simple reward function. Each sub-task learns an action policy, which enables the sub-task to selects an appropriate action to achieve a single goal. The standardized rewards of sub-tasks are calculated when fusing these sub-tasks to eliminate differences in rewards for sub-tasks. Then, the proposed method integrates the prior experience of three trained sub-tasks via an action policy based on a MADM to complete the source task. Simulation results show that the proposed method outperforms competitors.
ISSN:2169-3536