Open vocabulary semantic segmentation with Patch Aligned Contrastive Learning

We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder. With such an alignment, a model can identify regions of an...

وصف كامل

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون: Mukhoti, J, Lin, T-Y, Poursaeed, O, Wang, R, Shah, A, Torr, PHS, Lim, S-N
التنسيق: Conference item
اللغة:English
منشور في: IEEE 2023