LAVT: Language-Aware Vision Transformer for referring image segmentation

Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image. One of the key challenges behind this task is leveraging the referring expression for highlighting relevant positions in the image. A para...

Full description

Bibliographic Details
Main Authors:	Yang, Z, Wang, J, Tang, Y, Chen, K, Zhao, H, Torr, PHS
Format:	Conference item
Language:	English
Published:	IEEE 2022

LAVT: Language-Aware Vision Transformer for referring image segmentation

Similar Items