Language-aware vision transformer for referring segmentation

Language-aware vision transformer for referring segmentation

Referring segmentation is a fundamental vision-language task that aims to segment out an object from an image or video in accordance with a natural language description. One of the key challenges behind this task is leveraging the referring expression for highlighting relevant positions in the image...

Mô tả đầy đủ

Chi tiết về thư mục
Những tác giả chính:	Yang, Z, Wang, J, Ye, X, Tang, Y, Chen, K, Zhao, H, Torr, PHS
Định dạng:	Journal article
Ngôn ngữ:	English
Được phát hành:	IEEE 2024

Những quyển sách tương tự

LAVT: Language-Aware Vision Transformer for referring image segmentation
Bằng: Yang, Z, et al.
Được phát hành: (2022)

Semantics-aware dynamic localization and refinement for referring image segmentation
Bằng: Yang, Z, et al.
Được phát hành: (2023)

Vision transformers: from semantic segmentation to dense prediction
Bằng: Zhang, L, et al.
Được phát hành: (2024)

Hierarchical interaction network for video object segmentation from referring expressions
Bằng: Yang, Z, et al.
Được phát hành: (2021)

Behind every domain there is a shift: adapting distortion-aware vision transformers for panoramic semantic segmentation
Bằng: Zhang, J, et al.
Được phát hành: (2024)

LUNA: language as continuing anchors for referring expression comprehension
Bằng: Liang, Y, et al.
Được phát hành: (2023)

Behind every domain there is a shift: adapting distortion-aware vision transformers for panoramic semantic segmentation
Bằng: Zhang, J, et al.
Được phát hành: (2024)

An empirical study of detection-based video instance segmentation
Bằng: Wang, Q, et al.
Được phát hành: (2020)

Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers
Bằng: Zheng, S, et al.
Được phát hành: (2021)

Geometric motion segmentation and model selection
Bằng: Torr, PHS
Được phát hành: (1998)

Reference-aware language models
Bằng: Yang, Z, et al.
Được phát hành: (2017)

Unifying training and inference for panoptic segmentation
Bằng: Li, Q, et al.
Được phát hành: (2020)

Dynamic graph cuts and their applications in computer vision
Bằng: Kohli, P, et al.
Được phát hành: (2010)

Outlier detection and motion segmentation
Bằng: Torr, PHS, et al.
Được phát hành: (1993)

Patch-based separable transformer for visual recognition
Bằng: Sun, S, et al.
Được phát hành: (2022)

Semantic-aware auto-encoders for self-supervised representation learning
Bằng: Wang, G, et al.
Được phát hành: (2022)

Vision transformer with progressive sampling
Bằng: Yue, X, et al.
Được phát hành: (2022)

Improving few-shot learning by spatially-aware matching and crosstransformer
Bằng: Zhang, H, et al.
Được phát hành: (2023)

Concerning Bayesian motion segmentation, model averaging, matching and the trifocal tensor
Bằng: Torr, PHS, et al.
Được phát hành: (2006)

Bottom-up Instance Segmentation using Deep Higher-Order CRFs
Bằng: Arnab, A, et al.
Được phát hành: (2016)

An object category specific mrf for segmentation
Bằng: Kumar, MP, et al.
Được phát hành: (2007)

Learning layered motion segmentations of video
Bằng: Kumar, MP, et al.
Được phát hành: (2005)

On the robustness of semantic segmentation models to adversarial attacks
Bằng: Arnab, A, et al.
Được phát hành: (2019)

SegPGD: an effective and efficient adversarial attack for evaluating and boosting segmentation robustness
Bằng: Gu, J, et al.
Được phát hành: (2022)

Target identity-aware network flow for online multiple target tracking
Bằng: Dehghan, A, et al.
Được phát hành: (2015)

Learning layered motion segmentations of video
Bằng: Pawan Kumar, M, et al.
Được phát hành: (2007)

Urban 3D semantic modelling using stereo vision
Bằng: Sengupta, S, et al.
Được phát hành: (2013)

Object-aware vision and language navigation for domestic robots
Bằng: Zhao, Weiyi
Được phát hành: (2022)

Discovering class-specific pixels for weakly-supervised semantic segmentation
Bằng: Chaudhry, A, et al.
Được phát hành: (2017)

OBJCUT: efficient segmentation using top-down and bottom-up cues
Bằng: Kumar, MP, et al.
Được phát hành: (2009)

Practical Techniques for Vision-Language Segmentation Model in Remote Sensing
Bằng: Y. Lin, et al.
Được phát hành: (2024-06-01)

Benchmarking robustness of adaptation methods on pre-trained vision-language models
Bằng: Chen, S, et al.
Được phát hành: (2024)

Occluded video instance segmentation: A benchmark
Bằng: Qi, J, et al.
Được phát hành: (2022)

GeoNet++: Iterative geometric neural network with edge-aware refinement for joint depth and surface normal estimation
Bằng: Qi, X, et al.
Được phát hành: (2020)

Scalable cascade inference for semantic image segmentation
Bằng: Sturgess, P, et al.
Được phát hành: (2012)

POSECUT: simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts
Bằng: Bray, M, et al.
Được phát hành: (2006)

Deep FusionNet for point cloud semantic segmentation
Bằng: Zhang, F, et al.
Được phát hành: (2020)

Associative hierarchical CRFs for object class image segmentation
Bằng: Ladický, L, et al.
Được phát hành: (2009)

Prompting a pretrained transformer can be a universal approximator
Bằng: Petrov, A, et al.
Được phát hành: (2024)

Spatio-temporal action instance segmentation and localisation
Bằng: Saha, S, et al.
Được phát hành: (2020)