Rethinking 1D convolution for lightweight semantic segmentation

Lightweight semantic segmentation promotes the application of semantic segmentation in tiny devices. The existing lightweight semantic segmentation network (LSNet) has the problems of low precision and a large number of parameters. In response to the above problems, we designed a full 1D convolution...

Full description

Bibliographic Details
Main Authors:	Chunyu Zhang, Fang Xu, Chengdong Wu, Chenglong Xu
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2023-02-01
Series:	Frontiers in Neurorobotics
Subjects:	semantic segmentation lightweight network 1D convolution encoder-decoder feature alignment
Online Access:	https://www.frontiersin.org/articles/10.3389/fnbot.2023.1119231/full

_version_	1811168261106040832
author	Chunyu Zhang Fang Xu Chengdong Wu Chenglong Xu
author_facet	Chunyu Zhang Fang Xu Chengdong Wu Chenglong Xu
author_sort	Chunyu Zhang
collection	DOAJ
description	Lightweight semantic segmentation promotes the application of semantic segmentation in tiny devices. The existing lightweight semantic segmentation network (LSNet) has the problems of low precision and a large number of parameters. In response to the above problems, we designed a full 1D convolutional LSNet. The tremendous success of this network is attributed to the following three modules: 1D multi-layer space module (1D-MS), 1D multi-layer channel module (1D-MC), and flow alignment module (FA). The 1D-MS and the 1D-MC add global feature extraction operations based on the multi-layer perceptron (MLP) idea. This module uses 1D convolutional coding, which is more flexible than MLP. It increases the global information operation, improving features’ coding ability. The FA module fuses high-level and low-level semantic information, which solves the problem of precision loss caused by the misalignment of features. We designed a 1D-mixer encoder based on the transformer structure. It performed fusion encoding of the feature space information extracted by the 1D-MS module and the channel information extracted by the 1D-MC module. 1D-mixer obtains high-quality encoded features with very few parameters, which is the key to the network’s success. The attention pyramid with FA (AP-FA) uses an AP to decode features and adds a FA module to solve the problem of feature misalignment. Our network requires no pre-training and only needs a 1080Ti GPU for training. It achieved 72.6 mIoU and 95.6 FPS on the Cityscapes dataset and 70.5 mIoU and 122 FPS on the CamVid dataset. We ported the network trained on the ADE2K dataset to mobile devices, and the latency of 224 ms proves the application value of the network on mobile devices. The results on the three datasets prove that the network generalization ability we designed is powerful. Compared to state-of-the-art lightweight semantic segmentation algorithms, our designed network achieves the best balance between segmentation accuracy and parameters. The parameters of LSNet are only 0.62 M, which is currently the network with the highest segmentation accuracy within 1 M parameters.
first_indexed	2024-04-10T16:24:12Z
format	Article
id	doaj.art-0abf946b232742ef8d0a27a44f877888
institution	Directory Open Access Journal
issn	1662-5218
language	English
last_indexed	2024-04-10T16:24:12Z
publishDate	2023-02-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Neurorobotics
spelling	doaj.art-0abf946b232742ef8d0a27a44f8778882023-02-09T08:20:14ZengFrontiers Media S.A.Frontiers in Neurorobotics1662-52182023-02-011710.3389/fnbot.2023.11192311119231Rethinking 1D convolution for lightweight semantic segmentationChunyu Zhang0Fang Xu1Chengdong Wu2Chenglong Xu3Faculty of Robot Science and Engineering, Northeastern University, Shenyang, ChinaShenyang Siasun Robot & Automation Company Ltd., Shenyang, ChinaFaculty of Robot Science and Engineering, Northeastern University, Shenyang, ChinaCollege of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, ChinaLightweight semantic segmentation promotes the application of semantic segmentation in tiny devices. The existing lightweight semantic segmentation network (LSNet) has the problems of low precision and a large number of parameters. In response to the above problems, we designed a full 1D convolutional LSNet. The tremendous success of this network is attributed to the following three modules: 1D multi-layer space module (1D-MS), 1D multi-layer channel module (1D-MC), and flow alignment module (FA). The 1D-MS and the 1D-MC add global feature extraction operations based on the multi-layer perceptron (MLP) idea. This module uses 1D convolutional coding, which is more flexible than MLP. It increases the global information operation, improving features’ coding ability. The FA module fuses high-level and low-level semantic information, which solves the problem of precision loss caused by the misalignment of features. We designed a 1D-mixer encoder based on the transformer structure. It performed fusion encoding of the feature space information extracted by the 1D-MS module and the channel information extracted by the 1D-MC module. 1D-mixer obtains high-quality encoded features with very few parameters, which is the key to the network’s success. The attention pyramid with FA (AP-FA) uses an AP to decode features and adds a FA module to solve the problem of feature misalignment. Our network requires no pre-training and only needs a 1080Ti GPU for training. It achieved 72.6 mIoU and 95.6 FPS on the Cityscapes dataset and 70.5 mIoU and 122 FPS on the CamVid dataset. We ported the network trained on the ADE2K dataset to mobile devices, and the latency of 224 ms proves the application value of the network on mobile devices. The results on the three datasets prove that the network generalization ability we designed is powerful. Compared to state-of-the-art lightweight semantic segmentation algorithms, our designed network achieves the best balance between segmentation accuracy and parameters. The parameters of LSNet are only 0.62 M, which is currently the network with the highest segmentation accuracy within 1 M parameters.https://www.frontiersin.org/articles/10.3389/fnbot.2023.1119231/fullsemantic segmentationlightweight network1D convolutionencoder-decoderfeature alignment
spellingShingle	Chunyu Zhang Fang Xu Chengdong Wu Chenglong Xu Rethinking 1D convolution for lightweight semantic segmentation Frontiers in Neurorobotics semantic segmentation lightweight network 1D convolution encoder-decoder feature alignment
title	Rethinking 1D convolution for lightweight semantic segmentation
title_full	Rethinking 1D convolution for lightweight semantic segmentation
title_fullStr	Rethinking 1D convolution for lightweight semantic segmentation
title_full_unstemmed	Rethinking 1D convolution for lightweight semantic segmentation
title_short	Rethinking 1D convolution for lightweight semantic segmentation
title_sort	rethinking 1d convolution for lightweight semantic segmentation
topic	semantic segmentation lightweight network 1D convolution encoder-decoder feature alignment
url	https://www.frontiersin.org/articles/10.3389/fnbot.2023.1119231/full
work_keys_str_mv	AT chunyuzhang rethinking1dconvolutionforlightweightsemanticsegmentation AT fangxu rethinking1dconvolutionforlightweightsemanticsegmentation AT chengdongwu rethinking1dconvolutionforlightweightsemanticsegmentation AT chenglongxu rethinking1dconvolutionforlightweightsemanticsegmentation

Rethinking 1D convolution for lightweight semantic segmentation

Similar Items