Summary: | Clusters of unmanned aerial vehicles (UAVs) are often used to perform complex tasks. In such clusters, the reliability of the communication network connecting the UAVs is an essential factor in their collective efficiency. Due to the complex wireless environment, however, communication malfunctions within the cluster are likely during the flight of UAVs. In such cases, it is important to control the cluster and rebuild the connected network. The asymmetry of the cluster topology also increases the complexity of the control mechanisms. The traditional control methods based on cluster consistency often rely on the motion information of the neighboring UAVs. The motion information, however, may become unavailable because of the interrupted communications. UAV control algorithms based on deep reinforcement learning have achieved outstanding results in many fields. Here, we propose a cluster control method based on the Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG) to rebuild a communication network for UAV clusters. The DE-MADDPG improves the framework of the traditional multi-agent deep deterministic policy gradient (MADDPG) algorithm by decomposing the reward function. We further introduce the reward reshaping function to facilitate the convergence of the algorithm in sparse reward environments. To address the instability of the state-space in the reinforcement learning framework, we also propose the notion of the virtual leader–follower model. Extensive simulations show that the success rate of the DE-MADDPG is higher than that of the MADDPG algorithm, confirming the effectiveness of the proposed method.
|