Language-aware vision transformer for referring segmentation
Referring segmentation is a fundamental vision-language task that aims to segment out an object from an image or video in accordance with a natural language description. One of the key challenges behind this task is leveraging the referring expression for highlighting relevant positions in the image...
Κύριοι συγγραφείς: | Yang, Z, Wang, J, Ye, X, Tang, Y, Chen, K, Zhao, H, Torr, PHS |
---|---|
Μορφή: | Journal article |
Γλώσσα: | English |
Έκδοση: |
IEEE
2024
|
Παρόμοια τεκμήρια
-
LAVT: Language-Aware Vision Transformer for referring image segmentation
ανά: Yang, Z, κ.ά.
Έκδοση: (2022) -
Semantics-aware dynamic localization and refinement for referring image segmentation
ανά: Yang, Z, κ.ά.
Έκδοση: (2023) -
Vision transformers: from semantic segmentation to dense prediction
ανά: Zhang, L, κ.ά.
Έκδοση: (2024) -
Hierarchical interaction network for video object segmentation from referring expressions
ανά: Yang, Z, κ.ά.
Έκδοση: (2021) -
Behind every domain there is a shift: adapting distortion-aware vision transformers for panoramic semantic segmentation
ανά: Zhang, J, κ.ά.
Έκδοση: (2024)