Grounding referring expressions in images by variational context
We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., 'largest elephant standing behind baby elephant'. This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehens...
Main Authors: | Zhang, Hanwang, Niu, Yulei, Chang, Shih-Fu |
---|---|
Other Authors: | School of Computer Science and Engineering |
Format: | Conference Paper |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/143054 |
Similar Items
-
Kesan fizikal tanah bagi pengesanan utiliti bawah tanah melalui analisis imej ground penetratin radar /
by: Asrul Zakaria, 1983- 639256, et al.
Published: (2016) -
Kesan fizikal tanah bagi pengesanan utiliti bawah tanah melalui analisis imej ground penetrating radar [electronic resource] /
by: Asrul Zakaria, 1983-, author 639256, et al.
Published: (2016) -
Predicting political sentiments of voters from Twitter in multi-party contexts
by: Khatua, Aparup, et al.
Published: (2022) -
Investigation on energy output structure of explosives near-ground explosion
by: Xu, Wen-long, et al.
Published: (2020) -
Molecular architecture of the Chikungunya virus replication complex
by: Tan, Yaw Bia, et al.
Published: (2023)