LAVT: Language-Aware Vision Transformer for referring image segmentation
Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image. One of the key challenges behind this task is leveraging the referring expression for highlighting relevant positions in the image. A para...
Main Authors: | , , , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2022
|