Open vocabulary semantic segmentation with Patch Aligned Contrastive Learning

We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder. With such an alignment, a model can identify regions of an...

Descrizione completa

Dettagli Bibliografici
Autori principali: Mukhoti, J, Lin, T-Y, Poursaeed, O, Wang, R, Shah, A, Torr, PHS, Lim, S-N
Natura: Conference item
Lingua:English
Pubblicazione: IEEE 2023