LAVT: Language-Aware Vision Transformer for referring image segmentation

Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image. One of the key challenges behind this task is leveraging the referring expression for highlighting relevant positions in the image. A para...

Full description

Bibliographic Details
Main Authors: Yang, Z, Wang, J, Tang, Y, Chen, K, Zhao, H, Torr, PHS
Format: Conference item
Language:English
Published: IEEE 2022