An Intelligent Algorithm for USVs Collision Avoidance Based on Deep Reinforcement Learning Approach with Navigation Characteristics

Many achievements toward unmanned surface vehicles have been made using artificial intelligence theory to assist the decisions of the navigator. In particular, there has been rapid development in autonomous collision avoidance techniques that employ the intelligent algorithm of deep reinforcement le...

Full description

Bibliographic Details
Main Authors: Zhe Sun, Yunsheng Fan, Guofeng Wang
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Journal of Marine Science and Engineering
Subjects:
Online Access:https://www.mdpi.com/2077-1312/11/4/812
Description
Summary:Many achievements toward unmanned surface vehicles have been made using artificial intelligence theory to assist the decisions of the navigator. In particular, there has been rapid development in autonomous collision avoidance techniques that employ the intelligent algorithm of deep reinforcement learning. A novel USV collision avoidance algorithm based on deep reinforcement learning theory for real-time maneuvering is proposed. Many improvements toward the autonomous learning framework are carried out to improve the performance of USV collision avoidance, including prioritized experience replay, noisy network, double learning, and dueling architecture, which can significantly enhance the training effect. Additionally, considering the characteristics of the USV collision avoidance problem, two effective methods to enhance training efficiency are proposed. For better training, considering the international regulations for preventing collisions at sea and USV maneuverability, a complete and reliable USV collision avoidance training system is established, demonstrating an efficient learning process in complex encounter situations. A reward signal system in line with the USV characteristics is designed. Based on the Unity maritime virtual simulation platform, an abundant simulation environment for training and testing is designed. Through detailed analysis, verification, and comparison, the improved algorithm outperforms the pre-improved algorithm in terms of stability, average reward, rules learning, and collision avoidance effect, reducing 26.60% more accumulated course deviation and saving 1.13% more time.
ISSN:2077-1312