Long-term tracking with transformer and template update

Abstract Aiming at the tracking failure due to the disappearance of the target in the long-term target tracking process, this paper proposes a long-term target tracking network based on the visual transformer and template update. First of all, we construct a feature extraction network based on the t...

Full description

Bibliographic Details
Main Authors: Hongying Zhang, Xiaowen Peng, Xuyong Wang
Format: Article
Language:English
Published: SpringerOpen 2022-12-01
Series:EURASIP Journal on Advances in Signal Processing
Subjects:
Online Access:https://doi.org/10.1186/s13634-022-00954-4
_version_ 1797973406020272128
author Hongying Zhang
Xiaowen Peng
Xuyong Wang
author_facet Hongying Zhang
Xiaowen Peng
Xuyong Wang
author_sort Hongying Zhang
collection DOAJ
description Abstract Aiming at the tracking failure due to the disappearance of the target in the long-term target tracking process, this paper proposes a long-term target tracking network based on the visual transformer and template update. First of all, we construct a feature extraction network based on the transformer and adopt a knowledge distillation strategy to improve the effectiveness of the network for global feature extraction. Secondly, in the modeling transformer, the target features are fully fused with the search area features by using encoder, and the position information in the target query is learned by the decoder. Then, target predictions are performed on the information from the encoder–decoder to obtain tracking results. Meanwhile, we design a score head model to judge the validity of the dynamic template of the current frame before tracking in the next frame. We select the appropriate dynamic template for the tracking of the next frame according to the score result. In this paper, we performed extensive experiments on LaSOT, VOT2021-LT, TrackingNet, TLP, and UAV123 datasets, and the experimental results prove the effectiveness of our method. In particular, it exceeds STARK by 0.8 $$\%$$ % (F score) on VOT2021-LT, 1.0 $$\%$$ % (S score) on LaSOT, and TrackingNet exceed STARK by 1.1 $$\%$$ % (NP score), which also demonstrates the superiority of the method in this paper.
first_indexed 2024-04-11T04:03:45Z
format Article
id doaj.art-2299711249ec4dae942d9efc07a3143b
institution Directory Open Access Journal
issn 1687-6180
language English
last_indexed 2024-04-11T04:03:45Z
publishDate 2022-12-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Advances in Signal Processing
spelling doaj.art-2299711249ec4dae942d9efc07a3143b2023-01-01T12:30:11ZengSpringerOpenEURASIP Journal on Advances in Signal Processing1687-61802022-12-012022111710.1186/s13634-022-00954-4Long-term tracking with transformer and template updateHongying Zhang0Xiaowen Peng1Xuyong Wang2Civil Aviation University of ChinaCivil Aviation University of ChinaCivil Aviation University of ChinaAbstract Aiming at the tracking failure due to the disappearance of the target in the long-term target tracking process, this paper proposes a long-term target tracking network based on the visual transformer and template update. First of all, we construct a feature extraction network based on the transformer and adopt a knowledge distillation strategy to improve the effectiveness of the network for global feature extraction. Secondly, in the modeling transformer, the target features are fully fused with the search area features by using encoder, and the position information in the target query is learned by the decoder. Then, target predictions are performed on the information from the encoder–decoder to obtain tracking results. Meanwhile, we design a score head model to judge the validity of the dynamic template of the current frame before tracking in the next frame. We select the appropriate dynamic template for the tracking of the next frame according to the score result. In this paper, we performed extensive experiments on LaSOT, VOT2021-LT, TrackingNet, TLP, and UAV123 datasets, and the experimental results prove the effectiveness of our method. In particular, it exceeds STARK by 0.8 $$\%$$ % (F score) on VOT2021-LT, 1.0 $$\%$$ % (S score) on LaSOT, and TrackingNet exceed STARK by 1.1 $$\%$$ % (NP score), which also demonstrates the superiority of the method in this paper.https://doi.org/10.1186/s13634-022-00954-4TransformerLong-term trackingTemplate update
spellingShingle Hongying Zhang
Xiaowen Peng
Xuyong Wang
Long-term tracking with transformer and template update
EURASIP Journal on Advances in Signal Processing
Transformer
Long-term tracking
Template update
title Long-term tracking with transformer and template update
title_full Long-term tracking with transformer and template update
title_fullStr Long-term tracking with transformer and template update
title_full_unstemmed Long-term tracking with transformer and template update
title_short Long-term tracking with transformer and template update
title_sort long term tracking with transformer and template update
topic Transformer
Long-term tracking
Template update
url https://doi.org/10.1186/s13634-022-00954-4
work_keys_str_mv AT hongyingzhang longtermtrackingwithtransformerandtemplateupdate
AT xiaowenpeng longtermtrackingwithtransformerandtemplateupdate
AT xuyongwang longtermtrackingwithtransformerandtemplateupdate