Summary: | Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem.
It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the
uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate
for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm
(TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target
tracking problem. Several improvements on the original TD3 were also performed. First, the proportional�differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward
formulation for the UAV-based target tracking enabled a careful combination of the various dynamic
variables in the reward functions. This was accomplished by incorporating two exponential functions to
limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation.
In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing
concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including
piecewise decomposition was used to enable more stable learning behaviour of the policy and move out
from the linear reward to the achievement formula. The training was conducted based on fixed target
tracking followed by moving target tracking. The flight testing was conducted based on three types of target
trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both
exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target
and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the
case of a blinking target. With respect to the traditional proportional differential controller, the maximum
error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the
door to various applications of RL in target tracking.
|