End‐to‐end visual grounding via region proposal networks and bilinear pooling
Phrase‐based visual grounding aims to localise the object in the image referred by a textual query phrase. Most existing approaches adopt a two‐stage mechanism to address this problem: first, an off‐the‐shelf proposal generation model is adopted to extract region‐based visual features, and then a de...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2019-03-01
|
Series: | IET Computer Vision |
Subjects: | |
Online Access: | https://doi.org/10.1049/iet-cvi.2018.5104 |