ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking

In visual object tracking fields, the Siamese network tracker, based on the region proposal network (SiamRPN), has achieved promising tracking effects, both in speed and accuracy. However, it did not consider the relationship and differences between the long-range context information of various obje...

Full description

Bibliographic Details
Main Authors:	Xiaofei Qin, Yipeng Zhang, Hang Chang, Hao Lu, Xuedian Zhang
Format:	Article
Language:	English
Published:	MDPI AG 2020-09-01
Series:	Electronics
Subjects:	visual object tracking SiamRPN global context selective kernel convolution
Online Access:	https://www.mdpi.com/2079-9292/9/9/1528

_version_	1797553340525051904
author	Xiaofei Qin Yipeng Zhang Hang Chang Hao Lu Xuedian Zhang
author_facet	Xiaofei Qin Yipeng Zhang Hang Chang Hao Lu Xuedian Zhang
author_sort	Xiaofei Qin
collection	DOAJ
description	In visual object tracking fields, the Siamese network tracker, based on the region proposal network (SiamRPN), has achieved promising tracking effects, both in speed and accuracy. However, it did not consider the relationship and differences between the long-range context information of various objects. In this paper, we add a global context block (GC block), which is lightweight and can effectively model long-range dependency, to the Siamese network part of SiamRPN so that the object tracker can better understand the tracking scene. At the same time, we propose a novel convolution module, called a cropping-inside selective kernel block (CiSK block), based on selective kernel convolution (SK convolution, a module proposed in selective kernel networks) and use it in the region proposal network (RPN) part of SiamRPN, which can adaptively adjust the size of the receptive field for different types of objects. We make two improvements to SK convolution in the CiSK block. The first improvement is that in the fusion step of SK convolution, we use both global average pooling (GAP) and global maximum pooling (GMP) to enhance global information embedding. The second improvement is that after the selection step of SK convolution, we crop out the outermost pixels of features to reduce the impact of padding operations. The experiment results show that on the OTB100 benchmark, we achieved an accuracy of 0.857 and a success rate of 0.643. On the VOT2016 and VOT2019 benchmarks, we achieved expected average overlap (EAO) scores of 0.394 and 0.240, respectively.
first_indexed	2024-03-10T16:14:00Z
format	Article
id	doaj.art-d8b2e7af77694823b43b3773d56909c2
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-10T16:14:00Z
publishDate	2020-09-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-d8b2e7af77694823b43b3773d56909c22023-11-20T14:14:16ZengMDPI AGElectronics2079-92922020-09-0199152810.3390/electronics9091528ACSiamRPN: Adaptive Context Sampling for Visual Object TrackingXiaofei Qin0Yipeng Zhang1Hang Chang2Hao Lu3Xuedian Zhang4School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, ChinaSchool of Mechanical Engineering, University of Shanghai for Science and Technology, Shanghai 200093, ChinaLawrence Berkeley National Laboratory, Berkeley, CA 94720, USAGuangxi Yuchai Machinery Co., Ltd., Nanning 530007, ChinaSchool of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, ChinaIn visual object tracking fields, the Siamese network tracker, based on the region proposal network (SiamRPN), has achieved promising tracking effects, both in speed and accuracy. However, it did not consider the relationship and differences between the long-range context information of various objects. In this paper, we add a global context block (GC block), which is lightweight and can effectively model long-range dependency, to the Siamese network part of SiamRPN so that the object tracker can better understand the tracking scene. At the same time, we propose a novel convolution module, called a cropping-inside selective kernel block (CiSK block), based on selective kernel convolution (SK convolution, a module proposed in selective kernel networks) and use it in the region proposal network (RPN) part of SiamRPN, which can adaptively adjust the size of the receptive field for different types of objects. We make two improvements to SK convolution in the CiSK block. The first improvement is that in the fusion step of SK convolution, we use both global average pooling (GAP) and global maximum pooling (GMP) to enhance global information embedding. The second improvement is that after the selection step of SK convolution, we crop out the outermost pixels of features to reduce the impact of padding operations. The experiment results show that on the OTB100 benchmark, we achieved an accuracy of 0.857 and a success rate of 0.643. On the VOT2016 and VOT2019 benchmarks, we achieved expected average overlap (EAO) scores of 0.394 and 0.240, respectively.https://www.mdpi.com/2079-9292/9/9/1528visual object trackingSiamRPNglobal contextselective kernel convolution
spellingShingle	Xiaofei Qin Yipeng Zhang Hang Chang Hao Lu Xuedian Zhang ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking Electronics visual object tracking SiamRPN global context selective kernel convolution
title	ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking
title_full	ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking
title_fullStr	ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking
title_full_unstemmed	ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking
title_short	ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking
title_sort	acsiamrpn adaptive context sampling for visual object tracking
topic	visual object tracking SiamRPN global context selective kernel convolution
url	https://www.mdpi.com/2079-9292/9/9/1528
work_keys_str_mv	AT xiaofeiqin acsiamrpnadaptivecontextsamplingforvisualobjecttracking AT yipengzhang acsiamrpnadaptivecontextsamplingforvisualobjecttracking AT hangchang acsiamrpnadaptivecontextsamplingforvisualobjecttracking AT haolu acsiamrpnadaptivecontextsamplingforvisualobjecttracking AT xuedianzhang acsiamrpnadaptivecontextsamplingforvisualobjecttracking

ACSiamRPN: Adaptive Context Sampling for Visual Object Tracking

Similar Items