End‐to‐end visual grounding via region proposal networks and bilinear pooling

Phrase‐based visual grounding aims to localise the object in the image referred by a textual query phrase. Most existing approaches adopt a two‐stage mechanism to address this problem: first, an off‐the‐shelf proposal generation model is adopted to extract region‐based visual features, and then a de...

Full description

Bibliographic Details
Main Authors:	Chenchao Xiang, Zhou Yu, Suguo Zhu, Jun Yu, Xiaokang Yang
Format:	Article
Language:	English
Published:	Wiley 2019-03-01
Series:	IET Computer Vision
Subjects:	multimodal features real-world visual grounding datasets end-to-end approach phrase-based visual grounding region proposal networks textual query phrase
Online Access:	https://doi.org/10.1049/iet-cvi.2018.5104

Internet

https://doi.org/10.1049/iet-cvi.2018.5104

End‐to‐end visual grounding via region proposal networks and bilinear pooling

Internet

Similar Items