Siamese Transformer Network for Real-Time Aerial Object Tracking

Recently, deep learning (DL) based trackers have attracted tremendous interest for their high performance. Despite the remarkable success, most trackers utilizing deep convolution features commonly neglect tracking speed, which is crucial for aerial tracking on mobile devices. In this paper, we prop...

Full description

Bibliographic Details
Main Authors: Haijun Wang, Shengyan Zhang
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9908547/
Description
Summary:Recently, deep learning (DL) based trackers have attracted tremendous interest for their high performance. Despite the remarkable success, most trackers utilizing deep convolution features commonly neglect tracking speed, which is crucial for aerial tracking on mobile devices. In this paper, we propose an efficient and effective transformer based aerial tracker in the framework of Siamese, which inherits the merits from both transformer and Siamese architectures. Specifically, the outputs from multiple convolution layers are fed into transformer to construct robust features of template patch and search patch, respectively. Consequently, the interdependencies between low-level information and semantic information are interactively fused to improve the ability of encoding target appearance. Finally, traditional depth-wise cross correlation is introduced to generate a similarity map for object location and bounding box regression. Extensive experimental results on three popular benchmarks (DTB70, UAV123@10fps, and UAV20L) have demonstrated that our proposed tracker outperforms other 12 state-of-the-art trackers and achieves a real-time tracking speed of 71.3 frames per second (FPS) on GPU, which can be applied in mobile platform.
ISSN:2169-3536