D-TransT: Deformable Transformer Tracking
The tracker based on the Siamese network describes the object-tracking task as a similarity-matching problem. The Siamese network is the current mainstream model. It achieves similarity learning by applying correlation filters to the target and search branches’ convolution features. However, because...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-11-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/11/23/3843 |
_version_ | 1797463391701303296 |
---|---|
author | Jiahang Zhou Yuanzhe Yao Rong Yang Yuheng Xia |
author_facet | Jiahang Zhou Yuanzhe Yao Rong Yang Yuheng Xia |
author_sort | Jiahang Zhou |
collection | DOAJ |
description | The tracker based on the Siamese network describes the object-tracking task as a similarity-matching problem. The Siamese network is the current mainstream model. It achieves similarity learning by applying correlation filters to the target and search branches’ convolution features. However, because the correlation operation uses a local linear matching process, semantic information is lost, and it is simple to run into the issue of local optimality. Transformer Tracking has recently been proposed using an attention-based feature fusion network instead of the previous correlation operation to achieve excellent results. However, it only uses limited feature space resolution. Because of the limitations of the Transformer module, the network’s convergence is also very slow. We propose Deformable Transformer Tracking (D-TransT) which employs a deformable attention module that pre-filters for prominent key elements among all feature map pixels using a small set of sampling locations, and this module can be naturally extensible to aggregating multi-scale features. D-TransT can have faster convergence and better prediction than Transformer Tracking. D-TransT improves the convergence speed by 29.4% and achieves 65.6%, 73.3%, and 69.1% in AUC, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>P</mi><mrow><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi></mrow></msub></semantics></math></inline-formula> and P, respectively. The experimental results demonstrate that the proposed tracker performs better than the most state-of-the-art algorithm. |
first_indexed | 2024-03-09T17:50:00Z |
format | Article |
id | doaj.art-0c5937aa3f1f4035a31e8f699548519b |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-09T17:50:00Z |
publishDate | 2022-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-0c5937aa3f1f4035a31e8f699548519b2023-11-24T10:46:30ZengMDPI AGElectronics2079-92922022-11-011123384310.3390/electronics11233843D-TransT: Deformable Transformer TrackingJiahang Zhou0Yuanzhe Yao1Rong Yang2Yuheng Xia3School of Software Engineering, University of Electronic Science and Technology of China, Chengdu 610056, ChinaSchool of Software Engineering, University of Electronic Science and Technology of China, Chengdu 610056, ChinaBasic Frontier Research Institute, University of Electronic Science and Technology of China, Chengdu 610056, ChinaSchool of Software Engineering, University of Electronic Science and Technology of China, Chengdu 610056, ChinaThe tracker based on the Siamese network describes the object-tracking task as a similarity-matching problem. The Siamese network is the current mainstream model. It achieves similarity learning by applying correlation filters to the target and search branches’ convolution features. However, because the correlation operation uses a local linear matching process, semantic information is lost, and it is simple to run into the issue of local optimality. Transformer Tracking has recently been proposed using an attention-based feature fusion network instead of the previous correlation operation to achieve excellent results. However, it only uses limited feature space resolution. Because of the limitations of the Transformer module, the network’s convergence is also very slow. We propose Deformable Transformer Tracking (D-TransT) which employs a deformable attention module that pre-filters for prominent key elements among all feature map pixels using a small set of sampling locations, and this module can be naturally extensible to aggregating multi-scale features. D-TransT can have faster convergence and better prediction than Transformer Tracking. D-TransT improves the convergence speed by 29.4% and achieves 65.6%, 73.3%, and 69.1% in AUC, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>P</mi><mrow><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi></mrow></msub></semantics></math></inline-formula> and P, respectively. The experimental results demonstrate that the proposed tracker performs better than the most state-of-the-art algorithm.https://www.mdpi.com/2079-9292/11/23/3843Transformer TrackingSiamese networktransformer |
spellingShingle | Jiahang Zhou Yuanzhe Yao Rong Yang Yuheng Xia D-TransT: Deformable Transformer Tracking Electronics Transformer Tracking Siamese network transformer |
title | D-TransT: Deformable Transformer Tracking |
title_full | D-TransT: Deformable Transformer Tracking |
title_fullStr | D-TransT: Deformable Transformer Tracking |
title_full_unstemmed | D-TransT: Deformable Transformer Tracking |
title_short | D-TransT: Deformable Transformer Tracking |
title_sort | d transt deformable transformer tracking |
topic | Transformer Tracking Siamese network transformer |
url | https://www.mdpi.com/2079-9292/11/23/3843 |
work_keys_str_mv | AT jiahangzhou dtranstdeformabletransformertracking AT yuanzheyao dtranstdeformabletransformertracking AT rongyang dtranstdeformabletransformertracking AT yuhengxia dtranstdeformabletransformertracking |