CVTrack: Combined Convolutional Neural Network and Vision Transformer Fusion Model for Visual Tracking
Most single-object trackers currently employ either a convolutional neural network (CNN) or a vision transformer as the backbone for object tracking. In CNNs, convolutional operations excel at extracting local features but struggle to capture global representations. On the other hand, vision transfo...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-01-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/24/1/274 |