Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos
In the realm of visual tracking, remote sensing videos captured by Unmanned Aerial Vehicles (UAVs) have seen significant advancements with wide applications. However, there remain challenges to conventional Transformer-based trackers in balancing tracking accuracy and inference speed. This problem i...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-02-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/16/5/748 |
_version_ | 1797263951621259264 |
---|---|
author | Zhixing Wang Gaofan Zhou Jinzhen Yao Jianlin Zhang Qiliang Bao Qintao Hu |
author_facet | Zhixing Wang Gaofan Zhou Jinzhen Yao Jianlin Zhang Qiliang Bao Qintao Hu |
author_sort | Zhixing Wang |
collection | DOAJ |
description | In the realm of visual tracking, remote sensing videos captured by Unmanned Aerial Vehicles (UAVs) have seen significant advancements with wide applications. However, there remain challenges to conventional Transformer-based trackers in balancing tracking accuracy and inference speed. This problem is further exacerbated when Transformers are extensively implemented at larger model scales. To address this challenge, we present a fast and efficient UAV tracking framework, denoted as SiamPT, aiming to reduce the number of Transformer layers without losing the discriminative ability of the model. To realize it, we transfer the conventional prompting theories in multi-model tracking into UAV tracking, where a novel self-prompting method is proposed by utilizing the target’s inherent characteristics in the search branch to discriminate targets from the background. Specifically, a self-distribution strategy is introduced to capture feature-level relationships, which segment tokens into distinct smaller patches. Subsequently, salient tokens within the full attention map are identified as foreground targets, enabling the fusion of local region information. These fused tokens serve as prompters to enhance the identification of distractors, thereby avoiding the demand for model expansion. SiamPT has demonstrated impressive results on the UAV123 benchmark, achieving success and precision rates of 0.694 and 0.890 respectively, while maintaining an inference speed of 91.0 FPS. |
first_indexed | 2024-04-25T00:21:10Z |
format | Article |
id | doaj.art-724a6e51150343a486392aee761ca1ec |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-04-25T00:21:10Z |
publishDate | 2024-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-724a6e51150343a486392aee761ca1ec2024-03-12T16:53:53ZengMDPI AGRemote Sensing2072-42922024-02-0116574810.3390/rs16050748Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV VideosZhixing Wang0Gaofan Zhou1Jinzhen Yao2Jianlin Zhang3Qiliang Bao4Qintao Hu5Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaInstitute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaInstitute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaInstitute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaInstitute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaInstitute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, ChinaIn the realm of visual tracking, remote sensing videos captured by Unmanned Aerial Vehicles (UAVs) have seen significant advancements with wide applications. However, there remain challenges to conventional Transformer-based trackers in balancing tracking accuracy and inference speed. This problem is further exacerbated when Transformers are extensively implemented at larger model scales. To address this challenge, we present a fast and efficient UAV tracking framework, denoted as SiamPT, aiming to reduce the number of Transformer layers without losing the discriminative ability of the model. To realize it, we transfer the conventional prompting theories in multi-model tracking into UAV tracking, where a novel self-prompting method is proposed by utilizing the target’s inherent characteristics in the search branch to discriminate targets from the background. Specifically, a self-distribution strategy is introduced to capture feature-level relationships, which segment tokens into distinct smaller patches. Subsequently, salient tokens within the full attention map are identified as foreground targets, enabling the fusion of local region information. These fused tokens serve as prompters to enhance the identification of distractors, thereby avoiding the demand for model expansion. SiamPT has demonstrated impressive results on the UAV123 benchmark, achieving success and precision rates of 0.694 and 0.890 respectively, while maintaining an inference speed of 91.0 FPS.https://www.mdpi.com/2072-4292/16/5/748transformerUAV trackingprompting |
spellingShingle | Zhixing Wang Gaofan Zhou Jinzhen Yao Jianlin Zhang Qiliang Bao Qintao Hu Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos Remote Sensing transformer UAV tracking prompting |
title | Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos |
title_full | Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos |
title_fullStr | Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos |
title_full_unstemmed | Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos |
title_short | Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos |
title_sort | self prompting tracking a fast and efficient tracking pipeline for uav videos |
topic | transformer UAV tracking prompting |
url | https://www.mdpi.com/2072-4292/16/5/748 |
work_keys_str_mv | AT zhixingwang selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos AT gaofanzhou selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos AT jinzhenyao selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos AT jianlinzhang selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos AT qiliangbao selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos AT qintaohu selfpromptingtrackingafastandefficienttrackingpipelineforuavvideos |