Open vocabulary semantic segmentation with Patch Aligned Contrastive Learning

We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder. With such an alignment, a model can identify regions of an...

Volledige beschrijving

Bibliografische gegevens
Hoofdauteurs: Mukhoti, J, Lin, T-Y, Poursaeed, O, Wang, R, Shah, A, Torr, PHS, Lim, S-N
Formaat: Conference item
Taal:English
Gepubliceerd in: IEEE 2023