End‐to‐end feature fusion Siamese network for adaptive visual tracking

Abstract According to observations, different visual objects have different salient features in different scenarios. Even for the same object, its salient shape and appearance features may change greatly from time to time in a long‐term tracking task. Motivated by them, an end‐to‐end feature fusion...

Full description

Bibliographic Details
Main Authors:	Dongyan Guo, Jun Wang, Weixuan Zhao, Ying Cui, Zhenhua Wang, Shengyong Chen
Format:	Article
Language:	English
Published:	Wiley 2021-01-01
Series:	IET Image Processing
Subjects:	Image recognition Computer vision and image processing techniques Neural nets
Online Access:	https://doi.org/10.1049/ipr2.12009

_version_	1797986086501220352
author	Dongyan Guo Jun Wang Weixuan Zhao Ying Cui Zhenhua Wang Shengyong Chen
author_facet	Dongyan Guo Jun Wang Weixuan Zhao Ying Cui Zhenhua Wang Shengyong Chen
author_sort	Dongyan Guo
collection	DOAJ
description	Abstract According to observations, different visual objects have different salient features in different scenarios. Even for the same object, its salient shape and appearance features may change greatly from time to time in a long‐term tracking task. Motivated by them, an end‐to‐end feature fusion framework was proposed based on the Siamese network, named FF‐Siam, which can effectively fuse different features for adaptive visual tracking. The framework consists of four layers. A feature extraction layer is designed to extract the different features of the target region and search region. The extracted features are then put into a weight generation layer to obtain the channel weights, which indicate the importance of different feature channels. Both features and the channel weights are utilised in a template generation layer to generate a discriminative template. Finally, the corresponding response maps created by the convolution of the search region features and the template are applied with a fusion layer to obtain the final response map for locating the target. Experimental results demonstrate that the proposed framework achieves state‐of‐the‐art performance on the popular Temple‐Colour, OTB50 and UAV123 benchmarks.
first_indexed	2024-04-11T07:28:34Z
format	Article
id	doaj.art-39f951463ceb43d09637caf32807d831
institution	Directory Open Access Journal
issn	1751-9659 1751-9667
language	English
last_indexed	2024-04-11T07:28:34Z
publishDate	2021-01-01
publisher	Wiley
record_format	Article
series	IET Image Processing
spelling	doaj.art-39f951463ceb43d09637caf32807d8312022-12-22T04:36:59ZengWileyIET Image Processing1751-96591751-96672021-01-011519110010.1049/ipr2.12009End‐to‐end feature fusion Siamese network for adaptive visual trackingDongyan Guo0Jun Wang1Weixuan Zhao2Ying Cui3Zhenhua Wang4Shengyong Chen5College of Computer Science and Technology Zhejiang University of Technology Hangzhou Zhejiang ChinaCollege of Computer Science and Technology Zhejiang University of Technology Hangzhou Zhejiang ChinaCollege of Computer Science and Technology Zhejiang University of Technology Hangzhou Zhejiang ChinaCollege of Computer Science and Technology Zhejiang University of Technology Hangzhou Zhejiang ChinaCollege of Computer Science and Technology Zhejiang University of Technology Hangzhou Zhejiang ChinaSchool of Computer Science and Engineering Tianjin University of Technology Tianjin ChinaAbstract According to observations, different visual objects have different salient features in different scenarios. Even for the same object, its salient shape and appearance features may change greatly from time to time in a long‐term tracking task. Motivated by them, an end‐to‐end feature fusion framework was proposed based on the Siamese network, named FF‐Siam, which can effectively fuse different features for adaptive visual tracking. The framework consists of four layers. A feature extraction layer is designed to extract the different features of the target region and search region. The extracted features are then put into a weight generation layer to obtain the channel weights, which indicate the importance of different feature channels. Both features and the channel weights are utilised in a template generation layer to generate a discriminative template. Finally, the corresponding response maps created by the convolution of the search region features and the template are applied with a fusion layer to obtain the final response map for locating the target. Experimental results demonstrate that the proposed framework achieves state‐of‐the‐art performance on the popular Temple‐Colour, OTB50 and UAV123 benchmarks.https://doi.org/10.1049/ipr2.12009Image recognitionComputer vision and image processing techniquesNeural nets
spellingShingle	Dongyan Guo Jun Wang Weixuan Zhao Ying Cui Zhenhua Wang Shengyong Chen End‐to‐end feature fusion Siamese network for adaptive visual tracking IET Image Processing Image recognition Computer vision and image processing techniques Neural nets
title	End‐to‐end feature fusion Siamese network for adaptive visual tracking
title_full	End‐to‐end feature fusion Siamese network for adaptive visual tracking
title_fullStr	End‐to‐end feature fusion Siamese network for adaptive visual tracking
title_full_unstemmed	End‐to‐end feature fusion Siamese network for adaptive visual tracking
title_short	End‐to‐end feature fusion Siamese network for adaptive visual tracking
title_sort	end to end feature fusion siamese network for adaptive visual tracking
topic	Image recognition Computer vision and image processing techniques Neural nets
url	https://doi.org/10.1049/ipr2.12009
work_keys_str_mv	AT dongyanguo endtoendfeaturefusionsiamesenetworkforadaptivevisualtracking AT junwang endtoendfeaturefusionsiamesenetworkforadaptivevisualtracking AT weixuanzhao endtoendfeaturefusionsiamesenetworkforadaptivevisualtracking AT yingcui endtoendfeaturefusionsiamesenetworkforadaptivevisualtracking AT zhenhuawang endtoendfeaturefusionsiamesenetworkforadaptivevisualtracking AT shengyongchen endtoendfeaturefusionsiamesenetworkforadaptivevisualtracking

End‐to‐end feature fusion Siamese network for adaptive visual tracking

Similar Items