Joint visual template and natural language for robust visual tracking
Abstract At present, the target of interest in visual tracking is given in the form of a bounding box. Due to the randomness of the target shape, the bounding box may contain a lot of non‐target information. When encountering complex tracking scenarios, the performance of the tracker reduces severel...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2022-10-01
|
Series: | Electronics Letters |
Online Access: | https://doi.org/10.1049/ell2.12610 |
_version_ | 1797900579961307136 |
---|---|
author | Jingchao Wang Huanlong Zhang Jianwei Zhang |
author_facet | Jingchao Wang Huanlong Zhang Jianwei Zhang |
author_sort | Jingchao Wang |
collection | DOAJ |
description | Abstract At present, the target of interest in visual tracking is given in the form of a bounding box. Due to the randomness of the target shape, the bounding box may contain a lot of non‐target information. When encountering complex tracking scenarios, the performance of the tracker reduces severely. To address this problem, in this letter, the authors propose a novel tracking framework based on the joint of the visual template and natural language (VNTrack) to alleviate the impact of bounding box ambiguity. Specifically, the authors first use a pre‐trained language model to extract the features of the language description of the target. Then, a feature alignment module is designed to align and enhance the visual template feature and natural language feature. In addition, the authors design a multimodal query module to fuse the visual template, natural language, and search region information. Experimental results over tracking benchmarks with language annotations show that the proposed VNTrack is competitive among the state‐of‐the‐art trackers. |
first_indexed | 2024-04-10T08:48:09Z |
format | Article |
id | doaj.art-a1931a4a678b49c0af203425fd70ee9d |
institution | Directory Open Access Journal |
issn | 0013-5194 1350-911X |
language | English |
last_indexed | 2024-04-10T08:48:09Z |
publishDate | 2022-10-01 |
publisher | Wiley |
record_format | Article |
series | Electronics Letters |
spelling | doaj.art-a1931a4a678b49c0af203425fd70ee9d2023-02-22T06:31:08ZengWileyElectronics Letters0013-51941350-911X2022-10-01582179880010.1049/ell2.12610Joint visual template and natural language for robust visual trackingJingchao Wang0Huanlong Zhang1Jianwei Zhang2College of Software Engineering Zhengzhou University of Light Industry Zhengzhou ChinaCollege of Electric and Information Engineering Zhengzhou University of Light Industry Zhengzhou ChinaCollege of Software Engineering Zhengzhou University of Light Industry Zhengzhou ChinaAbstract At present, the target of interest in visual tracking is given in the form of a bounding box. Due to the randomness of the target shape, the bounding box may contain a lot of non‐target information. When encountering complex tracking scenarios, the performance of the tracker reduces severely. To address this problem, in this letter, the authors propose a novel tracking framework based on the joint of the visual template and natural language (VNTrack) to alleviate the impact of bounding box ambiguity. Specifically, the authors first use a pre‐trained language model to extract the features of the language description of the target. Then, a feature alignment module is designed to align and enhance the visual template feature and natural language feature. In addition, the authors design a multimodal query module to fuse the visual template, natural language, and search region information. Experimental results over tracking benchmarks with language annotations show that the proposed VNTrack is competitive among the state‐of‐the‐art trackers.https://doi.org/10.1049/ell2.12610 |
spellingShingle | Jingchao Wang Huanlong Zhang Jianwei Zhang Joint visual template and natural language for robust visual tracking Electronics Letters |
title | Joint visual template and natural language for robust visual tracking |
title_full | Joint visual template and natural language for robust visual tracking |
title_fullStr | Joint visual template and natural language for robust visual tracking |
title_full_unstemmed | Joint visual template and natural language for robust visual tracking |
title_short | Joint visual template and natural language for robust visual tracking |
title_sort | joint visual template and natural language for robust visual tracking |
url | https://doi.org/10.1049/ell2.12610 |
work_keys_str_mv | AT jingchaowang jointvisualtemplateandnaturallanguageforrobustvisualtracking AT huanlongzhang jointvisualtemplateandnaturallanguageforrobustvisualtracking AT jianweizhang jointvisualtemplateandnaturallanguageforrobustvisualtracking |