Open vocabulary semantic segmentation with Patch Aligned Contrastive Learning

We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder. With such an alignment, a model can identify regions of an...

Полное описание

Библиографические подробности
Главные авторы: Mukhoti, J, Lin, T-Y, Poursaeed, O, Wang, R, Shah, A, Torr, PHS, Lim, S-N
Формат: Conference item
Язык:English
Опубликовано: IEEE 2023
_version_ 1826311559665680384
author Mukhoti, J
Lin, T-Y
Poursaeed, O
Wang, R
Shah, A
Torr, PHS
Lim, S-N
author_facet Mukhoti, J
Lin, T-Y
Poursaeed, O
Wang, R
Shah, A
Torr, PHS
Lim, S-N
author_sort Mukhoti, J
collection OXFORD
description We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder. With such an alignment, a model can identify regions of an image corresponding to a given text input, and therefore transfer seamlessly to the task of open vocabulary semantic segmentation without requiring any segmentation annotations during training. Using pre-trained CLIP encoders with PACL, we are able to set the state-of-the-art on the task of open vocabulary zero-shot segmentation on 4 different segmentation benchmarks: Pascal VOC, Pascal Context, COCO Stuff and ADE20K. Furthermore, we show that PACL is also applicable to image-level predictions and when used with a CLIP backbone, provides a general improvement in zero-shot classification accuracy compared to CLIP, across a suite of 12 image classification datasets.
first_indexed 2024-03-07T08:11:35Z
format Conference item
id oxford-uuid:c359c40a-e93f-4303-b3c0-eb50e1c69a39
institution University of Oxford
language English
last_indexed 2024-03-07T08:11:35Z
publishDate 2023
publisher IEEE
record_format dspace
spelling oxford-uuid:c359c40a-e93f-4303-b3c0-eb50e1c69a392023-11-24T09:30:52ZOpen vocabulary semantic segmentation with Patch Aligned Contrastive LearningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:c359c40a-e93f-4303-b3c0-eb50e1c69a39EnglishSymplectic ElementsIEEE2023Mukhoti, JLin, T-YPoursaeed, OWang, RShah, ATorr, PHSLim, S-NWe introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder. With such an alignment, a model can identify regions of an image corresponding to a given text input, and therefore transfer seamlessly to the task of open vocabulary semantic segmentation without requiring any segmentation annotations during training. Using pre-trained CLIP encoders with PACL, we are able to set the state-of-the-art on the task of open vocabulary zero-shot segmentation on 4 different segmentation benchmarks: Pascal VOC, Pascal Context, COCO Stuff and ADE20K. Furthermore, we show that PACL is also applicable to image-level predictions and when used with a CLIP backbone, provides a general improvement in zero-shot classification accuracy compared to CLIP, across a suite of 12 image classification datasets.
spellingShingle Mukhoti, J
Lin, T-Y
Poursaeed, O
Wang, R
Shah, A
Torr, PHS
Lim, S-N
Open vocabulary semantic segmentation with Patch Aligned Contrastive Learning
title Open vocabulary semantic segmentation with Patch Aligned Contrastive Learning
title_full Open vocabulary semantic segmentation with Patch Aligned Contrastive Learning
title_fullStr Open vocabulary semantic segmentation with Patch Aligned Contrastive Learning
title_full_unstemmed Open vocabulary semantic segmentation with Patch Aligned Contrastive Learning
title_short Open vocabulary semantic segmentation with Patch Aligned Contrastive Learning
title_sort open vocabulary semantic segmentation with patch aligned contrastive learning
work_keys_str_mv AT mukhotij openvocabularysemanticsegmentationwithpatchalignedcontrastivelearning
AT linty openvocabularysemanticsegmentationwithpatchalignedcontrastivelearning
AT poursaeedo openvocabularysemanticsegmentationwithpatchalignedcontrastivelearning
AT wangr openvocabularysemanticsegmentationwithpatchalignedcontrastivelearning
AT shaha openvocabularysemanticsegmentationwithpatchalignedcontrastivelearning
AT torrphs openvocabularysemanticsegmentationwithpatchalignedcontrastivelearning
AT limsn openvocabularysemanticsegmentationwithpatchalignedcontrastivelearning