Language-aware vision transformer for referring segmentation

Referring segmentation is a fundamental vision-language task that aims to segment out an object from an image or video in accordance with a natural language description. One of the key challenges behind this task is leveraging the referring expression for highlighting relevant positions in the image...

Mô tả đầy đủ

Chi tiết về thư mục
Những tác giả chính: Yang, Z, Wang, J, Ye, X, Tang, Y, Chen, K, Zhao, H, Torr, PHS
Định dạng: Journal article
Ngôn ngữ:English
Được phát hành: IEEE 2024