Enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentation

Abstract Semantic segmentation is crucial to the autonomous driving, as an accurate recognition and location of the surrounding scenes can be provided for the street scenes understanding task. Many existing segmentation networks usually fuse high‐level and low‐level features to boost segmentation pe...

Full description

Bibliographic Details
Main Authors:	Jie Hu, Huifang Kong, Lei Fan, Jun Zhou
Format:	Article
Language:	English
Published:	Wiley 2021-09-01
Series:	IET Computer Vision
Subjects:	image fusion image segmentation mobile robots object recognition robot vision
Online Access:	https://doi.org/10.1049/cvi2.12026

_version_	1798041283893133312
author	Jie Hu Huifang Kong Lei Fan Jun Zhou
author_facet	Jie Hu Huifang Kong Lei Fan Jun Zhou
author_sort	Jie Hu
collection	DOAJ
description	Abstract Semantic segmentation is crucial to the autonomous driving, as an accurate recognition and location of the surrounding scenes can be provided for the street scenes understanding task. Many existing segmentation networks usually fuse high‐level and low‐level features to boost segmentation performance. However, the simple fusion may impose a limited performance improvement because of the gap between high‐level and low‐level features. To alleviate this limitation, we respectively propose spatial aggregation and channel fusion to bridge the gap. Our implementation, inspired by the attention mechanism, consists of two steps: (1) Spatial aggregation relies on the proposed pyramid spatial context aggregation module to capture spatial similarities to enhance the spatial representation of high‐level features, which is more effective for the latter fusion. (2) Channel fusion relies on the proposed attention‐based channel fusion module to weight channel maps on different levels to enhance the fusion. In addition, the complete network with U‐shape structure is constructed. A series of ablation experiments are conducted to demonstrate the effectiveness of our designs, and the network achieves mIoU score of 81.4% on Cityscapes test dataset and 84.6% on PASCALVOC 2012 test dataset.
first_indexed	2024-04-11T22:19:22Z
format	Article
id	doaj.art-a6aebdc37c794aec9a2dc4ad4282edec
institution	Directory Open Access Journal
issn	1751-9632 1751-9640
language	English
last_indexed	2024-04-11T22:19:22Z
publishDate	2021-09-01
publisher	Wiley
record_format	Article
series	IET Computer Vision
spelling	doaj.art-a6aebdc37c794aec9a2dc4ad4282edec2022-12-22T04:00:14ZengWileyIET Computer Vision1751-96321751-96402021-09-0115641842710.1049/cvi2.12026Enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentationJie Hu0Huifang Kong1Lei Fan2Jun Zhou3School of Electrical Engineering and Automation Hefei University of Technology Hefei ChinaSchool of Electrical Engineering and Automation Hefei University of Technology Hefei ChinaSchool of Electrical Engineering and Automation Hefei University of Technology Hefei ChinaSchool of Electrical Engineering and Automation Hefei University of Technology Hefei ChinaAbstract Semantic segmentation is crucial to the autonomous driving, as an accurate recognition and location of the surrounding scenes can be provided for the street scenes understanding task. Many existing segmentation networks usually fuse high‐level and low‐level features to boost segmentation performance. However, the simple fusion may impose a limited performance improvement because of the gap between high‐level and low‐level features. To alleviate this limitation, we respectively propose spatial aggregation and channel fusion to bridge the gap. Our implementation, inspired by the attention mechanism, consists of two steps: (1) Spatial aggregation relies on the proposed pyramid spatial context aggregation module to capture spatial similarities to enhance the spatial representation of high‐level features, which is more effective for the latter fusion. (2) Channel fusion relies on the proposed attention‐based channel fusion module to weight channel maps on different levels to enhance the fusion. In addition, the complete network with U‐shape structure is constructed. A series of ablation experiments are conducted to demonstrate the effectiveness of our designs, and the network achieves mIoU score of 81.4% on Cityscapes test dataset and 84.6% on PASCALVOC 2012 test dataset.https://doi.org/10.1049/cvi2.12026image fusionimage segmentationmobile robotsobject recognitionrobot vision
spellingShingle	Jie Hu Huifang Kong Lei Fan Jun Zhou Enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentation IET Computer Vision image fusion image segmentation mobile robots object recognition robot vision
title	Enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentation
title_full	Enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentation
title_fullStr	Enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentation
title_full_unstemmed	Enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentation
title_short	Enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentation
title_sort	enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentation
topic	image fusion image segmentation mobile robots object recognition robot vision
url	https://doi.org/10.1049/cvi2.12026
work_keys_str_mv	AT jiehu enhancingfeaturefusionwithspatialaggregationandchannelfusionforsemanticsegmentation AT huifangkong enhancingfeaturefusionwithspatialaggregationandchannelfusionforsemanticsegmentation AT leifan enhancingfeaturefusionwithspatialaggregationandchannelfusionforsemanticsegmentation AT junzhou enhancingfeaturefusionwithspatialaggregationandchannelfusionforsemanticsegmentation

Enhancing feature fusion with spatial aggregation and channel fusion for semantic segmentation

Similar Items