CVTrack: Combined Convolutional Neural Network and Vision Transformer Fusion Model for Visual Tracking

Most single-object trackers currently employ either a convolutional neural network (CNN) or a vision transformer as the backbone for object tracking. In CNNs, convolutional operations excel at extracting local features but struggle to capture global representations. On the other hand, vision transfo...

Full description

Bibliographic Details
Main Authors: Jian Wang, Yueming Song, Ce Song, Haonan Tian, Shuai Zhang, Jinghui Sun
Format: Article
Language:English
Published: MDPI AG 2024-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/1/274