Language-aware vision transformer for referring segmentation
Referring segmentation is a fundamental vision-language task that aims to segment out an object from an image or video in accordance with a natural language description. One of the key challenges behind this task is leveraging the referring expression for highlighting relevant positions in the image...
Príomhchruthaitheoirí: | , , , , , , |
---|---|
Formáid: | Journal article |
Teanga: | English |
Foilsithe / Cruthaithe: |
IEEE
2024
|