ConvFormer: Tracking by Fusing Convolution and Transformer Features

Current mainstream single-object trackers adopt the Transformer as the backbone for target tracking. However, due to the Transformer’s limitations in local information acquisition and position encoding, we proposed a new tracking framework called ConvFormer to enhance the model&#x2019...

Full description

Bibliographic Details
Main Author: Chao Zhang
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10176344/
_version_ 1797771674080247808
author Chao Zhang
author_facet Chao Zhang
author_sort Chao Zhang
collection DOAJ
description Current mainstream single-object trackers adopt the Transformer as the backbone for target tracking. However, due to the Transformer’s limitations in local information acquisition and position encoding, we proposed a new tracking framework called ConvFormer to enhance the model’s performance. Our framework aims to improve the feature extraction ability by combining the local feature extraction ability of CNN with the global feature extraction ability of the Transformer. To achieve synchronous feature extraction and fusion of the template and search region, we propose Mix Net Module (MNM), which achieves both global and local feature extraction and fusion for the template and search regions. Based on MNM, we stacked MNM modules and added a location head to complete the construction of the ConvFormer framework. Moreover, we designed a post-processing module to reduce the impact of tracker mistracking and improve the model’s robustness against interference from similar objects. Our framework achieved state-of-the-art performance on six benchmarks, including OTB2015, VOT2018, GOT-10k, LaSOT, TrackingNet, and UAV123. Notably, on the TrackingNet dataset, our tracker outperformed OSTrack by 1.4% with 83.2% precision. Additionally, on the LaSOT dataset, our tracker surpassed OSTrack by 2.6% with 77.4% precision. Finally, we conducted numerous ablation experiments to validate the efficacy of our approach.
first_indexed 2024-03-12T21:41:00Z
format Article
id doaj.art-03d959c77adc4ebb8e48e8b31b6d61e6
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-12T21:41:00Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-03d959c77adc4ebb8e48e8b31b6d61e62023-07-26T23:00:26ZengIEEEIEEE Access2169-35362023-01-0111748557486410.1109/ACCESS.2023.329359210176344ConvFormer: Tracking by Fusing Convolution and Transformer FeaturesChao Zhang0https://orcid.org/0000-0001-6076-836XDepartment of Computer Science, Beihang University, Beijing, ChinaCurrent mainstream single-object trackers adopt the Transformer as the backbone for target tracking. However, due to the Transformer’s limitations in local information acquisition and position encoding, we proposed a new tracking framework called ConvFormer to enhance the model’s performance. Our framework aims to improve the feature extraction ability by combining the local feature extraction ability of CNN with the global feature extraction ability of the Transformer. To achieve synchronous feature extraction and fusion of the template and search region, we propose Mix Net Module (MNM), which achieves both global and local feature extraction and fusion for the template and search regions. Based on MNM, we stacked MNM modules and added a location head to complete the construction of the ConvFormer framework. Moreover, we designed a post-processing module to reduce the impact of tracker mistracking and improve the model’s robustness against interference from similar objects. Our framework achieved state-of-the-art performance on six benchmarks, including OTB2015, VOT2018, GOT-10k, LaSOT, TrackingNet, and UAV123. Notably, on the TrackingNet dataset, our tracker outperformed OSTrack by 1.4% with 83.2% precision. Additionally, on the LaSOT dataset, our tracker surpassed OSTrack by 2.6% with 77.4% precision. Finally, we conducted numerous ablation experiments to validate the efficacy of our approach.https://ieeexplore.ieee.org/document/10176344/ConvFormersingle-object trackingtransformermixed net module
spellingShingle Chao Zhang
ConvFormer: Tracking by Fusing Convolution and Transformer Features
IEEE Access
ConvFormer
single-object tracking
transformer
mixed net module
title ConvFormer: Tracking by Fusing Convolution and Transformer Features
title_full ConvFormer: Tracking by Fusing Convolution and Transformer Features
title_fullStr ConvFormer: Tracking by Fusing Convolution and Transformer Features
title_full_unstemmed ConvFormer: Tracking by Fusing Convolution and Transformer Features
title_short ConvFormer: Tracking by Fusing Convolution and Transformer Features
title_sort convformer tracking by fusing convolution and transformer features
topic ConvFormer
single-object tracking
transformer
mixed net module
url https://ieeexplore.ieee.org/document/10176344/
work_keys_str_mv AT chaozhang convformertrackingbyfusingconvolutionandtransformerfeatures