D-TransT: Deformable Transformer Tracking

The tracker based on the Siamese network describes the object-tracking task as a similarity-matching problem. The Siamese network is the current mainstream model. It achieves similarity learning by applying correlation filters to the target and search branches’ convolution features. However, because...

Full description

Bibliographic Details
Main Authors:	Jiahang Zhou, Yuanzhe Yao, Rong Yang, Yuheng Xia
Format:	Article
Language:	English
Published:	MDPI AG 2022-11-01
Series:	Electronics
Subjects:	Transformer Tracking Siamese network transformer
Online Access:	https://www.mdpi.com/2079-9292/11/23/3843

_version_	1797463391701303296
author	Jiahang Zhou Yuanzhe Yao Rong Yang Yuheng Xia
author_facet	Jiahang Zhou Yuanzhe Yao Rong Yang Yuheng Xia
author_sort	Jiahang Zhou
collection	DOAJ
description	The tracker based on the Siamese network describes the object-tracking task as a similarity-matching problem. The Siamese network is the current mainstream model. It achieves similarity learning by applying correlation filters to the target and search branches’ convolution features. However, because the correlation operation uses a local linear matching process, semantic information is lost, and it is simple to run into the issue of local optimality. Transformer Tracking has recently been proposed using an attention-based feature fusion network instead of the previous correlation operation to achieve excellent results. However, it only uses limited feature space resolution. Because of the limitations of the Transformer module, the network’s convergence is also very slow. We propose Deformable Transformer Tracking (D-TransT) which employs a deformable attention module that pre-filters for prominent key elements among all feature map pixels using a small set of sampling locations, and this module can be naturally extensible to aggregating multi-scale features. D-TransT can have faster convergence and better prediction than Transformer Tracking. D-TransT improves the convergence speed by 29.4% and achieves 65.6%, 73.3%, and 69.1% in AUC, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>P</mi><mrow><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi></mrow></msub></semantics></math></inline-formula> and P, respectively. The experimental results demonstrate that the proposed tracker performs better than the most state-of-the-art algorithm.
first_indexed	2024-03-09T17:50:00Z
format	Article
id	doaj.art-0c5937aa3f1f4035a31e8f699548519b
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-09T17:50:00Z
publishDate	2022-11-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-0c5937aa3f1f4035a31e8f699548519b2023-11-24T10:46:30ZengMDPI AGElectronics2079-92922022-11-011123384310.3390/electronics11233843D-TransT: Deformable Transformer TrackingJiahang Zhou0Yuanzhe Yao1Rong Yang2Yuheng Xia3School of Software Engineering, University of Electronic Science and Technology of China, Chengdu 610056, ChinaSchool of Software Engineering, University of Electronic Science and Technology of China, Chengdu 610056, ChinaBasic Frontier Research Institute, University of Electronic Science and Technology of China, Chengdu 610056, ChinaSchool of Software Engineering, University of Electronic Science and Technology of China, Chengdu 610056, ChinaThe tracker based on the Siamese network describes the object-tracking task as a similarity-matching problem. The Siamese network is the current mainstream model. It achieves similarity learning by applying correlation filters to the target and search branches’ convolution features. However, because the correlation operation uses a local linear matching process, semantic information is lost, and it is simple to run into the issue of local optimality. Transformer Tracking has recently been proposed using an attention-based feature fusion network instead of the previous correlation operation to achieve excellent results. However, it only uses limited feature space resolution. Because of the limitations of the Transformer module, the network’s convergence is also very slow. We propose Deformable Transformer Tracking (D-TransT) which employs a deformable attention module that pre-filters for prominent key elements among all feature map pixels using a small set of sampling locations, and this module can be naturally extensible to aggregating multi-scale features. D-TransT can have faster convergence and better prediction than Transformer Tracking. D-TransT improves the convergence speed by 29.4% and achieves 65.6%, 73.3%, and 69.1% in AUC, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>P</mi><mrow><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi></mrow></msub></semantics></math></inline-formula> and P, respectively. The experimental results demonstrate that the proposed tracker performs better than the most state-of-the-art algorithm.https://www.mdpi.com/2079-9292/11/23/3843Transformer TrackingSiamese networktransformer
spellingShingle	Jiahang Zhou Yuanzhe Yao Rong Yang Yuheng Xia D-TransT: Deformable Transformer Tracking Electronics Transformer Tracking Siamese network transformer
title	D-TransT: Deformable Transformer Tracking
title_full	D-TransT: Deformable Transformer Tracking
title_fullStr	D-TransT: Deformable Transformer Tracking
title_full_unstemmed	D-TransT: Deformable Transformer Tracking
title_short	D-TransT: Deformable Transformer Tracking
title_sort	d transt deformable transformer tracking
topic	Transformer Tracking Siamese network transformer
url	https://www.mdpi.com/2079-9292/11/23/3843
work_keys_str_mv	AT jiahangzhou dtranstdeformabletransformertracking AT yuanzheyao dtranstdeformabletransformertracking AT rongyang dtranstdeformabletransformertracking AT yuhengxia dtranstdeformabletransformertracking

D-TransT: Deformable Transformer Tracking

Similar Items