End‐to‐end visual grounding via region proposal networks and bilinear pooling

Phrase‐based visual grounding aims to localise the object in the image referred by a textual query phrase. Most existing approaches adopt a two‐stage mechanism to address this problem: first, an off‐the‐shelf proposal generation model is adopted to extract region‐based visual features, and then a de...

Full description

Bibliographic Details
Main Authors: Chenchao Xiang, Zhou Yu, Suguo Zhu, Jun Yu, Xiaokang Yang
Format: Article
Language:English
Published: Wiley 2019-03-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/iet-cvi.2018.5104

Similar Items