Joint visual template and natural language for robust visual tracking

Abstract At present, the target of interest in visual tracking is given in the form of a bounding box. Due to the randomness of the target shape, the bounding box may contain a lot of non‐target information. When encountering complex tracking scenarios, the performance of the tracker reduces severel...

Full description

Bibliographic Details
Main Authors: Jingchao Wang, Huanlong Zhang, Jianwei Zhang
Format: Article
Language:English
Published: Wiley 2022-10-01
Series:Electronics Letters
Online Access:https://doi.org/10.1049/ell2.12610
_version_ 1797900579961307136
author Jingchao Wang
Huanlong Zhang
Jianwei Zhang
author_facet Jingchao Wang
Huanlong Zhang
Jianwei Zhang
author_sort Jingchao Wang
collection DOAJ
description Abstract At present, the target of interest in visual tracking is given in the form of a bounding box. Due to the randomness of the target shape, the bounding box may contain a lot of non‐target information. When encountering complex tracking scenarios, the performance of the tracker reduces severely. To address this problem, in this letter, the authors propose a novel tracking framework based on the joint of the visual template and natural language (VNTrack) to alleviate the impact of bounding box ambiguity. Specifically, the authors first use a pre‐trained language model to extract the features of the language description of the target. Then, a feature alignment module is designed to align and enhance the visual template feature and natural language feature. In addition, the authors design a multimodal query module to fuse the visual template, natural language, and search region information. Experimental results over tracking benchmarks with language annotations show that the proposed VNTrack is competitive among the state‐of‐the‐art trackers.
first_indexed 2024-04-10T08:48:09Z
format Article
id doaj.art-a1931a4a678b49c0af203425fd70ee9d
institution Directory Open Access Journal
issn 0013-5194
1350-911X
language English
last_indexed 2024-04-10T08:48:09Z
publishDate 2022-10-01
publisher Wiley
record_format Article
series Electronics Letters
spelling doaj.art-a1931a4a678b49c0af203425fd70ee9d2023-02-22T06:31:08ZengWileyElectronics Letters0013-51941350-911X2022-10-01582179880010.1049/ell2.12610Joint visual template and natural language for robust visual trackingJingchao Wang0Huanlong Zhang1Jianwei Zhang2College of Software Engineering Zhengzhou University of Light Industry Zhengzhou ChinaCollege of Electric and Information Engineering Zhengzhou University of Light Industry Zhengzhou ChinaCollege of Software Engineering Zhengzhou University of Light Industry Zhengzhou ChinaAbstract At present, the target of interest in visual tracking is given in the form of a bounding box. Due to the randomness of the target shape, the bounding box may contain a lot of non‐target information. When encountering complex tracking scenarios, the performance of the tracker reduces severely. To address this problem, in this letter, the authors propose a novel tracking framework based on the joint of the visual template and natural language (VNTrack) to alleviate the impact of bounding box ambiguity. Specifically, the authors first use a pre‐trained language model to extract the features of the language description of the target. Then, a feature alignment module is designed to align and enhance the visual template feature and natural language feature. In addition, the authors design a multimodal query module to fuse the visual template, natural language, and search region information. Experimental results over tracking benchmarks with language annotations show that the proposed VNTrack is competitive among the state‐of‐the‐art trackers.https://doi.org/10.1049/ell2.12610
spellingShingle Jingchao Wang
Huanlong Zhang
Jianwei Zhang
Joint visual template and natural language for robust visual tracking
Electronics Letters
title Joint visual template and natural language for robust visual tracking
title_full Joint visual template and natural language for robust visual tracking
title_fullStr Joint visual template and natural language for robust visual tracking
title_full_unstemmed Joint visual template and natural language for robust visual tracking
title_short Joint visual template and natural language for robust visual tracking
title_sort joint visual template and natural language for robust visual tracking
url https://doi.org/10.1049/ell2.12610
work_keys_str_mv AT jingchaowang jointvisualtemplateandnaturallanguageforrobustvisualtracking
AT huanlongzhang jointvisualtemplateandnaturallanguageforrobustvisualtracking
AT jianweizhang jointvisualtemplateandnaturallanguageforrobustvisualtracking