Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation

Transformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and s...

Full description

Bibliographic Details
Main Authors: Zhengyu Xia, Joohee Kim
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/2/581
_version_ 1797437373975363584
author Zhengyu Xia
Joohee Kim
author_facet Zhengyu Xia
Joohee Kim
author_sort Zhengyu Xia
collection DOAJ
description Transformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and segmenting small objects due to relying heavily on transformers. To this end, we propose a simple yet effective architecture that introduces auxiliary branches to Mask2Former during training to capture dense local features on the encoder side. The obtained features help improve the performance of learning local information and segmenting small objects. Since the proposed auxiliary convolution layers are required only for training and can be removed during inference, the performance gain can be obtained without additional computation at inference. Experimental results show that our model can achieve state-of-the-art performance (57.6% mIoU) on the ADE20K and (84.8% mIoU) on the Cityscapes datasets.
first_indexed 2024-03-09T11:19:16Z
format Article
id doaj.art-8453e566be3d4d6284f47ff78a5f2b21
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-09T11:19:16Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-8453e566be3d4d6284f47ff78a5f2b212023-12-01T00:24:04ZengMDPI AGSensors1424-82202023-01-0123258110.3390/s23020581Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic SegmentationZhengyu Xia0Joohee Kim1Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USADepartment of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USATransformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and segmenting small objects due to relying heavily on transformers. To this end, we propose a simple yet effective architecture that introduces auxiliary branches to Mask2Former during training to capture dense local features on the encoder side. The obtained features help improve the performance of learning local information and segmenting small objects. Since the proposed auxiliary convolution layers are required only for training and can be removed during inference, the performance gain can be obtained without additional computation at inference. Experimental results show that our model can achieve state-of-the-art performance (57.6% mIoU) on the ADE20K and (84.8% mIoU) on the Cityscapes datasets.https://www.mdpi.com/1424-8220/23/2/581deep learningsemantic segmentationimage segmentationtransformerconvolutional neural networks
spellingShingle Zhengyu Xia
Joohee Kim
Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
Sensors
deep learning
semantic segmentation
image segmentation
transformer
convolutional neural networks
title Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_full Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_fullStr Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_full_unstemmed Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_short Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_sort enhancing mask transformer with auxiliary convolution layers for semantic segmentation
topic deep learning
semantic segmentation
image segmentation
transformer
convolutional neural networks
url https://www.mdpi.com/1424-8220/23/2/581
work_keys_str_mv AT zhengyuxia enhancingmasktransformerwithauxiliaryconvolutionlayersforsemanticsegmentation
AT jooheekim enhancingmasktransformerwithauxiliaryconvolutionlayersforsemanticsegmentation