Open vocabulary semantic segmentation with Patch Aligned Contrastive Learning

We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder. With such an alignment, a model can identify regions of an...

Disgrifiad llawn

Manylion Llyfryddiaeth
Prif Awduron: Mukhoti, J, Lin, T-Y, Poursaeed, O, Wang, R, Shah, A, Torr, PHS, Lim, S-N
Fformat: Conference item
Iaith:English
Cyhoeddwyd: IEEE 2023