Siamese Network Tracker Based on Multi-Scale Feature Fusion

The main task in visual object tracking is to track a moving object in an image sequence. In this process, the object’s trajectory and behavior can be described by calculating the object’s position, velocity, acceleration, and other parameters or by memorizing the position of the object in each fram...

Full description

Bibliographic Details
Main Authors: Jiaxu Zhao, Dapeng Niu
Format: Article
Language:English
Published: MDPI AG 2023-08-01
Series:Systems
Subjects:
Online Access:https://www.mdpi.com/2079-8954/11/8/434
_version_ 1797583146109108224
author Jiaxu Zhao
Dapeng Niu
author_facet Jiaxu Zhao
Dapeng Niu
author_sort Jiaxu Zhao
collection DOAJ
description The main task in visual object tracking is to track a moving object in an image sequence. In this process, the object’s trajectory and behavior can be described by calculating the object’s position, velocity, acceleration, and other parameters or by memorizing the position of the object in each frame of the corresponding video. Therefore, visual object tracking can complete many more advanced tasks, has great performance in relation to real scenes, and is widely used in automated driving, traffic monitoring, human–computer interaction, and so on. Siamese-network-based trackers have been receiving a great deal of attention from the tracking community, but they have many drawbacks. This paper analyzes the shortcomings of the Siamese network tracker in detail, uses the method of feature multi-scale fusion to improve the Siamese network tracker, and proposes a new target-tracking framework to address its shortcomings. In this paper, a feature map with low-resolution but strong semantic information and a feature map with high-resolution and rich spatial information are integrated to improve the model’s ability to depict an object, and the problem of scale change is solved by fusing features at different scales. Furthermore, we utilize the 3D Max Filtering module to suppress repeated predictions of features at different scales. Finally, our experiments conducted on the four tracking benchmarks OTB2015, VOT2016, VOT2018, and GOT10K show that the proposed algorithm effectively improves the tracking accuracy and robustness of the system.
first_indexed 2024-03-10T23:32:41Z
format Article
id doaj.art-79ae715b22af435f92ca68e4af17c9d1
institution Directory Open Access Journal
issn 2079-8954
language English
last_indexed 2024-03-10T23:32:41Z
publishDate 2023-08-01
publisher MDPI AG
record_format Article
series Systems
spelling doaj.art-79ae715b22af435f92ca68e4af17c9d12023-11-19T03:13:08ZengMDPI AGSystems2079-89542023-08-0111843410.3390/systems11080434Siamese Network Tracker Based on Multi-Scale Feature FusionJiaxu Zhao0Dapeng Niu1College of Information Science and Engineering, Northeastern University, Shenyang 110819, ChinaCollege of Information Science and Engineering, Northeastern University, Shenyang 110819, ChinaThe main task in visual object tracking is to track a moving object in an image sequence. In this process, the object’s trajectory and behavior can be described by calculating the object’s position, velocity, acceleration, and other parameters or by memorizing the position of the object in each frame of the corresponding video. Therefore, visual object tracking can complete many more advanced tasks, has great performance in relation to real scenes, and is widely used in automated driving, traffic monitoring, human–computer interaction, and so on. Siamese-network-based trackers have been receiving a great deal of attention from the tracking community, but they have many drawbacks. This paper analyzes the shortcomings of the Siamese network tracker in detail, uses the method of feature multi-scale fusion to improve the Siamese network tracker, and proposes a new target-tracking framework to address its shortcomings. In this paper, a feature map with low-resolution but strong semantic information and a feature map with high-resolution and rich spatial information are integrated to improve the model’s ability to depict an object, and the problem of scale change is solved by fusing features at different scales. Furthermore, we utilize the 3D Max Filtering module to suppress repeated predictions of features at different scales. Finally, our experiments conducted on the four tracking benchmarks OTB2015, VOT2016, VOT2018, and GOT10K show that the proposed algorithm effectively improves the tracking accuracy and robustness of the system.https://www.mdpi.com/2079-8954/11/8/434visual object trackingautomated drivingdeep learningartificial intelligencecomputer vision
spellingShingle Jiaxu Zhao
Dapeng Niu
Siamese Network Tracker Based on Multi-Scale Feature Fusion
Systems
visual object tracking
automated driving
deep learning
artificial intelligence
computer vision
title Siamese Network Tracker Based on Multi-Scale Feature Fusion
title_full Siamese Network Tracker Based on Multi-Scale Feature Fusion
title_fullStr Siamese Network Tracker Based on Multi-Scale Feature Fusion
title_full_unstemmed Siamese Network Tracker Based on Multi-Scale Feature Fusion
title_short Siamese Network Tracker Based on Multi-Scale Feature Fusion
title_sort siamese network tracker based on multi scale feature fusion
topic visual object tracking
automated driving
deep learning
artificial intelligence
computer vision
url https://www.mdpi.com/2079-8954/11/8/434
work_keys_str_mv AT jiaxuzhao siamesenetworktrackerbasedonmultiscalefeaturefusion
AT dapengniu siamesenetworktrackerbasedonmultiscalefeaturefusion