TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation

Abstract Semantic segmentation plays a vital role in indoor scene analysis. Currently, its accuracy is still limited due to the complex conditions of various indoor scenes. In addition, it is difficult to complete this task solely relying on RGB images. Since depth images can provide additional 3D g...

Full description

Bibliographic Details
Main Authors:	Weikuan Jia, Xingchao Yan, Qiaolian Liu, Ting Zhang, Xishang Dong
Format:	Article
Language:	English
Published:	Springer 2023-08-01
Series:	Complex & Intelligent Systems
Subjects:	Depth images Indoor semantic segmentation Three-stream Coordinate attention
Online Access:	https://doi.org/10.1007/s40747-023-01210-4

_version_	1797272237815889920
author	Weikuan Jia Xingchao Yan Qiaolian Liu Ting Zhang Xishang Dong
author_facet	Weikuan Jia Xingchao Yan Qiaolian Liu Ting Zhang Xishang Dong
author_sort	Weikuan Jia
collection	DOAJ
description	Abstract Semantic segmentation plays a vital role in indoor scene analysis. Currently, its accuracy is still limited due to the complex conditions of various indoor scenes. In addition, it is difficult to complete this task solely relying on RGB images. Since depth images can provide additional 3D geometric information to RGB images, researchers chose to incorporate depth images for improving the accuracy of indoor semantic segmentation. However, it is still a challenge to effectively fuse the depth information with the RGB images. To address this issue, a three-stream coordinate attention network is proposed. The presented network reconstructs a multi-modal feature fusion module for RGB-D features, which can realize the aggregation of two modal information along the spatial and channel dimensions. Meanwhile, three convolutional neural network branches are used to construct a parallel three-stream structure, which can, respectively, process the RGB features, depth features and combined features. On one hand, the proposed network can preserve the original RGB and depth feature streams, simultaneously. On the other hand, it can also contribute to utilize and propagate the fusion feature flow better. The embedded ASPP module is used to optimize the semantic information in the proposed network, so as to aggregate the feature information of different scales and obtain more accurate features. Experimental results show that the proposed model can reach a state-of-the-art mIoU accuracy of 50.2% on the NYUDv2 dataset and on the more complex SUN-RGBD dataset.
first_indexed	2024-03-07T14:25:27Z
format	Article
id	doaj.art-ac6003e4396e4fb28e64b07f93093fc7
institution	Directory Open Access Journal
issn	2199-4536 2198-6053
language	English
last_indexed	2024-03-07T14:25:27Z
publishDate	2023-08-01
publisher	Springer
record_format	Article
series	Complex & Intelligent Systems
spelling	doaj.art-ac6003e4396e4fb28e64b07f93093fc72024-03-06T08:07:01ZengSpringerComplex & Intelligent Systems2199-45362198-60532023-08-011011219123010.1007/s40747-023-01210-4TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentationWeikuan Jia0Xingchao Yan1Qiaolian Liu2Ting Zhang3Xishang Dong4School of Information Science and Engineering, Zaozhuang UniversitySchool of Information Science and Engineering, Shandong Normal UniversitySchool of Information Science and Engineering, Zaozhuang UniversitySchool of Information Science and Engineering, Zaozhuang UniversitySchool of Information Science and Engineering, Zaozhuang UniversityAbstract Semantic segmentation plays a vital role in indoor scene analysis. Currently, its accuracy is still limited due to the complex conditions of various indoor scenes. In addition, it is difficult to complete this task solely relying on RGB images. Since depth images can provide additional 3D geometric information to RGB images, researchers chose to incorporate depth images for improving the accuracy of indoor semantic segmentation. However, it is still a challenge to effectively fuse the depth information with the RGB images. To address this issue, a three-stream coordinate attention network is proposed. The presented network reconstructs a multi-modal feature fusion module for RGB-D features, which can realize the aggregation of two modal information along the spatial and channel dimensions. Meanwhile, three convolutional neural network branches are used to construct a parallel three-stream structure, which can, respectively, process the RGB features, depth features and combined features. On one hand, the proposed network can preserve the original RGB and depth feature streams, simultaneously. On the other hand, it can also contribute to utilize and propagate the fusion feature flow better. The embedded ASPP module is used to optimize the semantic information in the proposed network, so as to aggregate the feature information of different scales and obtain more accurate features. Experimental results show that the proposed model can reach a state-of-the-art mIoU accuracy of 50.2% on the NYUDv2 dataset and on the more complex SUN-RGBD dataset.https://doi.org/10.1007/s40747-023-01210-4Depth imagesIndoor semantic segmentationThree-streamCoordinate attention
spellingShingle	Weikuan Jia Xingchao Yan Qiaolian Liu Ting Zhang Xishang Dong TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation Complex & Intelligent Systems Depth images Indoor semantic segmentation Three-stream Coordinate attention
title	TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation
title_full	TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation
title_fullStr	TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation
title_full_unstemmed	TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation
title_short	TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation
title_sort	tcanet three stream coordinate attention network for rgb d indoor semantic segmentation
topic	Depth images Indoor semantic segmentation Three-stream Coordinate attention
url	https://doi.org/10.1007/s40747-023-01210-4
work_keys_str_mv	AT weikuanjia tcanetthreestreamcoordinateattentionnetworkforrgbdindoorsemanticsegmentation AT xingchaoyan tcanetthreestreamcoordinateattentionnetworkforrgbdindoorsemanticsegmentation AT qiaolianliu tcanetthreestreamcoordinateattentionnetworkforrgbdindoorsemanticsegmentation AT tingzhang tcanetthreestreamcoordinateattentionnetworkforrgbdindoorsemanticsegmentation AT xishangdong tcanetthreestreamcoordinateattentionnetworkforrgbdindoorsemanticsegmentation

TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation

Similar Items