D-TransT: Deformable Transformer Tracking

The tracker based on the Siamese network describes the object-tracking task as a similarity-matching problem. The Siamese network is the current mainstream model. It achieves similarity learning by applying correlation filters to the target and search branches’ convolution features. However, because...

Full description

Bibliographic Details
Main Authors: Jiahang Zhou, Yuanzhe Yao, Rong Yang, Yuheng Xia
Format: Article
Language:English
Published: MDPI AG 2022-11-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/23/3843
_version_ 1797463391701303296
author Jiahang Zhou
Yuanzhe Yao
Rong Yang
Yuheng Xia
author_facet Jiahang Zhou
Yuanzhe Yao
Rong Yang
Yuheng Xia
author_sort Jiahang Zhou
collection DOAJ
description The tracker based on the Siamese network describes the object-tracking task as a similarity-matching problem. The Siamese network is the current mainstream model. It achieves similarity learning by applying correlation filters to the target and search branches’ convolution features. However, because the correlation operation uses a local linear matching process, semantic information is lost, and it is simple to run into the issue of local optimality. Transformer Tracking has recently been proposed using an attention-based feature fusion network instead of the previous correlation operation to achieve excellent results. However, it only uses limited feature space resolution. Because of the limitations of the Transformer module, the network’s convergence is also very slow. We propose Deformable Transformer Tracking (D-TransT) which employs a deformable attention module that pre-filters for prominent key elements among all feature map pixels using a small set of sampling locations, and this module can be naturally extensible to aggregating multi-scale features. D-TransT can have faster convergence and better prediction than Transformer Tracking. D-TransT improves the convergence speed by 29.4% and achieves 65.6%, 73.3%, and 69.1% in AUC, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>P</mi><mrow><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi></mrow></msub></semantics></math></inline-formula> and P, respectively. The experimental results demonstrate that the proposed tracker performs better than the most state-of-the-art algorithm.
first_indexed 2024-03-09T17:50:00Z
format Article
id doaj.art-0c5937aa3f1f4035a31e8f699548519b
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-09T17:50:00Z
publishDate 2022-11-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-0c5937aa3f1f4035a31e8f699548519b2023-11-24T10:46:30ZengMDPI AGElectronics2079-92922022-11-011123384310.3390/electronics11233843D-TransT: Deformable Transformer TrackingJiahang Zhou0Yuanzhe Yao1Rong Yang2Yuheng Xia3School of Software Engineering, University of Electronic Science and Technology of China, Chengdu 610056, ChinaSchool of Software Engineering, University of Electronic Science and Technology of China, Chengdu 610056, ChinaBasic Frontier Research Institute, University of Electronic Science and Technology of China, Chengdu 610056, ChinaSchool of Software Engineering, University of Electronic Science and Technology of China, Chengdu 610056, ChinaThe tracker based on the Siamese network describes the object-tracking task as a similarity-matching problem. The Siamese network is the current mainstream model. It achieves similarity learning by applying correlation filters to the target and search branches’ convolution features. However, because the correlation operation uses a local linear matching process, semantic information is lost, and it is simple to run into the issue of local optimality. Transformer Tracking has recently been proposed using an attention-based feature fusion network instead of the previous correlation operation to achieve excellent results. However, it only uses limited feature space resolution. Because of the limitations of the Transformer module, the network’s convergence is also very slow. We propose Deformable Transformer Tracking (D-TransT) which employs a deformable attention module that pre-filters for prominent key elements among all feature map pixels using a small set of sampling locations, and this module can be naturally extensible to aggregating multi-scale features. D-TransT can have faster convergence and better prediction than Transformer Tracking. D-TransT improves the convergence speed by 29.4% and achieves 65.6%, 73.3%, and 69.1% in AUC, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>P</mi><mrow><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi></mrow></msub></semantics></math></inline-formula> and P, respectively. The experimental results demonstrate that the proposed tracker performs better than the most state-of-the-art algorithm.https://www.mdpi.com/2079-9292/11/23/3843Transformer TrackingSiamese networktransformer
spellingShingle Jiahang Zhou
Yuanzhe Yao
Rong Yang
Yuheng Xia
D-TransT: Deformable Transformer Tracking
Electronics
Transformer Tracking
Siamese network
transformer
title D-TransT: Deformable Transformer Tracking
title_full D-TransT: Deformable Transformer Tracking
title_fullStr D-TransT: Deformable Transformer Tracking
title_full_unstemmed D-TransT: Deformable Transformer Tracking
title_short D-TransT: Deformable Transformer Tracking
title_sort d transt deformable transformer tracking
topic Transformer Tracking
Siamese network
transformer
url https://www.mdpi.com/2079-9292/11/23/3843
work_keys_str_mv AT jiahangzhou dtranstdeformabletransformertracking
AT yuanzheyao dtranstdeformabletransformertracking
AT rongyang dtranstdeformabletransformertracking
AT yuhengxia dtranstdeformabletransformertracking