Transductive meta-learning with enhanced feature ensemble for few-shot semantic segmentation

Abstract This paper addresses few-shot semantic segmentation and proposes a novel transductive end-to-end method that overcomes three key problems affecting performance. First, we present a novel ensemble of visual features learned from pretrained classification and semantic segmentation networks wi...

Full description

Bibliographic Details
Main Authors: Amin Karimi, Charalambos Poullis
Format: Article
Language:English
Published: Nature Portfolio 2024-02-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-024-54640-6
_version_ 1827327821282803712
author Amin Karimi
Charalambos Poullis
author_facet Amin Karimi
Charalambos Poullis
author_sort Amin Karimi
collection DOAJ
description Abstract This paper addresses few-shot semantic segmentation and proposes a novel transductive end-to-end method that overcomes three key problems affecting performance. First, we present a novel ensemble of visual features learned from pretrained classification and semantic segmentation networks with the same architecture. Our approach leverages the varying discriminative power of these networks, resulting in rich and diverse visual features that are more informative than a pretrained classification backbone that is not optimized for dense pixel-wise classification tasks used in most state-of-the-art methods. Secondly, the pretrained semantic segmentation network serves as a base class extractor, which effectively mitigates false positives that occur during inference time and are caused by base objects other than the object of interest. Thirdly, a two-step segmentation approach using transductive meta-learning is presented to address the episodes with poor similarity between the support and query images. The proposed transductive meta-learning method addresses the prediction by first learning the relationship between labeled and unlabeled data points with matching support foreground to query features (intra-class similarity) and then applying this knowledge to predict on the unlabeled query image (intra-object similarity), which simultaneously learns propagation and false positive suppression. To evaluate our method, we performed experiments on benchmark datasets, and the results demonstrate significant improvement with minimal trainable parameters of 2.98M. Specifically, using Resnet-101, we achieve state-of-the-art performance for both 1-shot and 5-shot Pascal- $$5^{i}$$ 5 i , as well as for 1-shot and 5-shot COCO- $$20^{i}$$ 20 i .
first_indexed 2024-03-07T15:07:58Z
format Article
id doaj.art-2af665a644b34959aafae83436627e6a
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-07T15:07:58Z
publishDate 2024-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-2af665a644b34959aafae83436627e6a2024-03-05T18:47:17ZengNature PortfolioScientific Reports2045-23222024-02-0114111310.1038/s41598-024-54640-6Transductive meta-learning with enhanced feature ensemble for few-shot semantic segmentationAmin Karimi0Charalambos Poullis1Immersive and Creative Technologies Lab, Department of Computer Science and Software Engineering, Concordia UniversityImmersive and Creative Technologies Lab, Department of Computer Science and Software Engineering, Concordia UniversityAbstract This paper addresses few-shot semantic segmentation and proposes a novel transductive end-to-end method that overcomes three key problems affecting performance. First, we present a novel ensemble of visual features learned from pretrained classification and semantic segmentation networks with the same architecture. Our approach leverages the varying discriminative power of these networks, resulting in rich and diverse visual features that are more informative than a pretrained classification backbone that is not optimized for dense pixel-wise classification tasks used in most state-of-the-art methods. Secondly, the pretrained semantic segmentation network serves as a base class extractor, which effectively mitigates false positives that occur during inference time and are caused by base objects other than the object of interest. Thirdly, a two-step segmentation approach using transductive meta-learning is presented to address the episodes with poor similarity between the support and query images. The proposed transductive meta-learning method addresses the prediction by first learning the relationship between labeled and unlabeled data points with matching support foreground to query features (intra-class similarity) and then applying this knowledge to predict on the unlabeled query image (intra-object similarity), which simultaneously learns propagation and false positive suppression. To evaluate our method, we performed experiments on benchmark datasets, and the results demonstrate significant improvement with minimal trainable parameters of 2.98M. Specifically, using Resnet-101, we achieve state-of-the-art performance for both 1-shot and 5-shot Pascal- $$5^{i}$$ 5 i , as well as for 1-shot and 5-shot COCO- $$20^{i}$$ 20 i .https://doi.org/10.1038/s41598-024-54640-6
spellingShingle Amin Karimi
Charalambos Poullis
Transductive meta-learning with enhanced feature ensemble for few-shot semantic segmentation
Scientific Reports
title Transductive meta-learning with enhanced feature ensemble for few-shot semantic segmentation
title_full Transductive meta-learning with enhanced feature ensemble for few-shot semantic segmentation
title_fullStr Transductive meta-learning with enhanced feature ensemble for few-shot semantic segmentation
title_full_unstemmed Transductive meta-learning with enhanced feature ensemble for few-shot semantic segmentation
title_short Transductive meta-learning with enhanced feature ensemble for few-shot semantic segmentation
title_sort transductive meta learning with enhanced feature ensemble for few shot semantic segmentation
url https://doi.org/10.1038/s41598-024-54640-6
work_keys_str_mv AT aminkarimi transductivemetalearningwithenhancedfeatureensembleforfewshotsemanticsegmentation
AT charalambospoullis transductivemetalearningwithenhancedfeatureensembleforfewshotsemanticsegmentation