Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation

Transformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and s...

Full description

Bibliographic Details
Main Authors:	Zhengyu Xia, Joohee Kim
Format:	Article
Language:	English
Published:	MDPI AG 2023-01-01
Series:	Sensors
Subjects:	deep learning semantic segmentation image segmentation transformer convolutional neural networks
Online Access:	https://www.mdpi.com/1424-8220/23/2/581

_version_	1797437373975363584
author	Zhengyu Xia Joohee Kim
author_facet	Zhengyu Xia Joohee Kim
author_sort	Zhengyu Xia
collection	DOAJ
description	Transformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and segmenting small objects due to relying heavily on transformers. To this end, we propose a simple yet effective architecture that introduces auxiliary branches to Mask2Former during training to capture dense local features on the encoder side. The obtained features help improve the performance of learning local information and segmenting small objects. Since the proposed auxiliary convolution layers are required only for training and can be removed during inference, the performance gain can be obtained without additional computation at inference. Experimental results show that our model can achieve state-of-the-art performance (57.6% mIoU) on the ADE20K and (84.8% mIoU) on the Cityscapes datasets.
first_indexed	2024-03-09T11:19:16Z
format	Article
id	doaj.art-8453e566be3d4d6284f47ff78a5f2b21
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-09T11:19:16Z
publishDate	2023-01-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-8453e566be3d4d6284f47ff78a5f2b212023-12-01T00:24:04ZengMDPI AGSensors1424-82202023-01-0123258110.3390/s23020581Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic SegmentationZhengyu Xia0Joohee Kim1Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USADepartment of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USATransformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and segmenting small objects due to relying heavily on transformers. To this end, we propose a simple yet effective architecture that introduces auxiliary branches to Mask2Former during training to capture dense local features on the encoder side. The obtained features help improve the performance of learning local information and segmenting small objects. Since the proposed auxiliary convolution layers are required only for training and can be removed during inference, the performance gain can be obtained without additional computation at inference. Experimental results show that our model can achieve state-of-the-art performance (57.6% mIoU) on the ADE20K and (84.8% mIoU) on the Cityscapes datasets.https://www.mdpi.com/1424-8220/23/2/581deep learningsemantic segmentationimage segmentationtransformerconvolutional neural networks
spellingShingle	Zhengyu Xia Joohee Kim Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation Sensors deep learning semantic segmentation image segmentation transformer convolutional neural networks
title	Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_full	Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_fullStr	Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_full_unstemmed	Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_short	Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_sort	enhancing mask transformer with auxiliary convolution layers for semantic segmentation
topic	deep learning semantic segmentation image segmentation transformer convolutional neural networks
url	https://www.mdpi.com/1424-8220/23/2/581
work_keys_str_mv	AT zhengyuxia enhancingmasktransformerwithauxiliaryconvolutionlayersforsemanticsegmentation AT jooheekim enhancingmasktransformerwithauxiliaryconvolutionlayersforsemanticsegmentation

Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation

Similar Items