Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos

In the realm of visual tracking, remote sensing videos captured by Unmanned Aerial Vehicles (UAVs) have seen significant advancements with wide applications. However, there remain challenges to conventional Transformer-based trackers in balancing tracking accuracy and inference speed. This problem i...

Full description

Bibliographic Details
Main Authors: Zhixing Wang, Gaofan Zhou, Jinzhen Yao, Jianlin Zhang, Qiliang Bao, Qintao Hu
Format: Article
Language:English
Published: MDPI AG 2024-02-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/16/5/748
_version_ 1797263951621259264
author Zhixing Wang
Gaofan Zhou
Jinzhen Yao
Jianlin Zhang
Qiliang Bao
Qintao Hu
author_facet Zhixing Wang
Gaofan Zhou
Jinzhen Yao
Jianlin Zhang
Qiliang Bao
Qintao Hu
author_sort Zhixing Wang
collection DOAJ
description In the realm of visual tracking, remote sensing videos captured by Unmanned Aerial Vehicles (UAVs) have seen significant advancements with wide applications. However, there remain challenges to conventional Transformer-based trackers in balancing tracking accuracy and inference speed. This problem is further exacerbated when Transformers are extensively implemented at larger model scales. To address this challenge, we present a fast and efficient UAV tracking framework, denoted as SiamPT, aiming to reduce the number of Transformer layers without losing the discriminative ability of the model. To realize it, we transfer the conventional prompting theories in multi-model tracking into UAV tracking, where a novel self-prompting method is proposed by utilizing the target’s inherent characteristics in the search branch to discriminate targets from the background. Specifically, a self-distribution strategy is introduced to capture feature-level relationships, which segment tokens into distinct smaller patches. Subsequently, salient tokens within the full attention map are identified as foreground targets, enabling the fusion of local region information. These fused tokens serve as prompters to enhance the identification of distractors, thereby avoiding the demand for model expansion. SiamPT has demonstrated impressive results on the UAV123 benchmark, achieving success and precision rates of 0.694 and 0.890 respectively, while maintaining an inference speed of 91.0 FPS.
first_indexed 2024-04-25T00:21:10Z
format Article
id doaj.art-724a6e51150343a486392aee761ca1ec
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-04-25T00:21:10Z
publishDate 2024-02-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-724a6e51150343a486392aee761ca1ec2024-03-12T16:53:53ZengMDPI AGRemote Sensing2072-42922024-02-0116574810.3390/rs16050748Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV VideosZhixing Wang0Gaofan Zhou1Jinzhen Yao2Jianlin Zhang3Qiliang Bao4Qintao Hu5Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaInstitute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaInstitute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaInstitute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaInstitute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaInstitute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaIn the realm of visual tracking, remote sensing videos captured by Unmanned Aerial Vehicles (UAVs) have seen significant advancements with wide applications. However, there remain challenges to conventional Transformer-based trackers in balancing tracking accuracy and inference speed. This problem is further exacerbated when Transformers are extensively implemented at larger model scales. To address this challenge, we present a fast and efficient UAV tracking framework, denoted as SiamPT, aiming to reduce the number of Transformer layers without losing the discriminative ability of the model. To realize it, we transfer the conventional prompting theories in multi-model tracking into UAV tracking, where a novel self-prompting method is proposed by utilizing the target’s inherent characteristics in the search branch to discriminate targets from the background. Specifically, a self-distribution strategy is introduced to capture feature-level relationships, which segment tokens into distinct smaller patches. Subsequently, salient tokens within the full attention map are identified as foreground targets, enabling the fusion of local region information. These fused tokens serve as prompters to enhance the identification of distractors, thereby avoiding the demand for model expansion. SiamPT has demonstrated impressive results on the UAV123 benchmark, achieving success and precision rates of 0.694 and 0.890 respectively, while maintaining an inference speed of 91.0 FPS.https://www.mdpi.com/2072-4292/16/5/748transformerUAV trackingprompting
spellingShingle Zhixing Wang
Gaofan Zhou
Jinzhen Yao
Jianlin Zhang
Qiliang Bao
Qintao Hu
Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos
Remote Sensing
transformer
UAV tracking
prompting
title Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos
title_full Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos
title_fullStr Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos
title_full_unstemmed Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos
title_short Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos
title_sort self prompting tracking a fast and efficient tracking pipeline for uav videos
topic transformer
UAV tracking
prompting
url https://www.mdpi.com/2072-4292/16/5/748
work_keys_str_mv AT zhixingwang selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos
AT gaofanzhou selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos
AT jinzhenyao selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos
AT jianlinzhang selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos
AT qiliangbao selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos
AT qintaohu selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos