Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery

The acquisition of global context and boundary information is crucial for the semantic segmentation of remote sensing (RS) images. In contrast to convolutional neural networks (CNNs), transformers exhibit superior performance in global modeling and shape feature encoding, which provides a novel aven...

Full description

Bibliographic Details
Main Authors:	Xuanyu Zhou, Lifan Zhou, Shengrong Gong, Shan Zhong, Wei Yan, Yizhou Huang
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Dual-stream remote sensing (RS) semantic segmentation Swin transformer
Online Access:	https://ieeexplore.ieee.org/document/10294282/

_version_	1797437687169286144
author	Xuanyu Zhou Lifan Zhou Shengrong Gong Shan Zhong Wei Yan Yizhou Huang
author_facet	Xuanyu Zhou Lifan Zhou Shengrong Gong Shan Zhong Wei Yan Yizhou Huang
author_sort	Xuanyu Zhou
collection	DOAJ
description	The acquisition of global context and boundary information is crucial for the semantic segmentation of remote sensing (RS) images. In contrast to convolutional neural networks (CNNs), transformers exhibit superior performance in global modeling and shape feature encoding, which provides a novel avenue for obtaining global context and boundary information. However, current methods fail to effectively leverage these distinctive advantages of transformers. To address this issue, we propose a novel single encoder and dual decoders architecture called STDSNet, which embeds the Swin transformer into the dual-stream network for semantic segmentation of RS imagery. The proposed STDSNet employs the Swin transformer as the network backbone in the encoder to address the limitations of CNNs in global modeling and encoding shape features. The dual decoder comprises two parallel streams, namely the global stream (GS) and the shape stream (SS). The GS utilizes the global context fusion module (GCFM) to address the loss of global context during upsampling. It further integrates GCFMs with skip connections and a multiscale fusion strategy to mitigate large-scale regional object classification errors resulting from similar features or shadow occlusion in RS images. The SS introduces the gate convolution module (GCM) to filter out irrelevant features, allowing it to focus on processing boundary information, which improves the semantic segmentation performance of small targets and their boundaries in RS images. Extensive experiments demonstrate that STDSNet outperforms other state-of-the-art methods on the ISPRS Vaihingen and Potsdam benchmarks.
first_indexed	2024-03-09T11:26:12Z
format	Article
id	doaj.art-0ea6dd2779db466fb3c11f5ec7744423
institution	Directory Open Access Journal
issn	2151-1535
language	English
last_indexed	2024-03-09T11:26:12Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj.art-0ea6dd2779db466fb3c11f5ec77444232023-12-01T00:00:34ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing2151-15352024-01-011717518910.1109/JSTARS.2023.332696710294282Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing ImageryXuanyu Zhou0https://orcid.org/0009-0002-0633-694XLifan Zhou1https://orcid.org/0000-0001-7665-413XShengrong Gong2https://orcid.org/0000-0003-0266-2422Shan Zhong3https://orcid.org/0000-0003-0034-6952Wei Yan4https://orcid.org/0000-0002-3412-4015Yizhou Huang5https://orcid.org/0009-0000-1584-233XSchool of Computer and Information Technology, Northeast Petroleum University, Daqing, ChinaSchool of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, ChinaSchool of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, ChinaSchool of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, ChinaSchool of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, ChinaSchool of Electrical and Automation Engineering, Changshu Institute of Technology, Suzhou, ChinaThe acquisition of global context and boundary information is crucial for the semantic segmentation of remote sensing (RS) images. In contrast to convolutional neural networks (CNNs), transformers exhibit superior performance in global modeling and shape feature encoding, which provides a novel avenue for obtaining global context and boundary information. However, current methods fail to effectively leverage these distinctive advantages of transformers. To address this issue, we propose a novel single encoder and dual decoders architecture called STDSNet, which embeds the Swin transformer into the dual-stream network for semantic segmentation of RS imagery. The proposed STDSNet employs the Swin transformer as the network backbone in the encoder to address the limitations of CNNs in global modeling and encoding shape features. The dual decoder comprises two parallel streams, namely the global stream (GS) and the shape stream (SS). The GS utilizes the global context fusion module (GCFM) to address the loss of global context during upsampling. It further integrates GCFMs with skip connections and a multiscale fusion strategy to mitigate large-scale regional object classification errors resulting from similar features or shadow occlusion in RS images. The SS introduces the gate convolution module (GCM) to filter out irrelevant features, allowing it to focus on processing boundary information, which improves the semantic segmentation performance of small targets and their boundaries in RS images. Extensive experiments demonstrate that STDSNet outperforms other state-of-the-art methods on the ISPRS Vaihingen and Potsdam benchmarks.https://ieeexplore.ieee.org/document/10294282/Dual-streamremote sensing (RS)semantic segmentationSwin transformer
spellingShingle	Xuanyu Zhou Lifan Zhou Shengrong Gong Shan Zhong Wei Yan Yizhou Huang Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Dual-stream remote sensing (RS) semantic segmentation Swin transformer
title	Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery
title_full	Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery
title_fullStr	Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery
title_full_unstemmed	Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery
title_short	Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery
title_sort	swin transformer embedding dual stream for semantic segmentation of remote sensing imagery
topic	Dual-stream remote sensing (RS) semantic segmentation Swin transformer
url	https://ieeexplore.ieee.org/document/10294282/
work_keys_str_mv	AT xuanyuzhou swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery AT lifanzhou swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery AT shengronggong swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery AT shanzhong swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery AT weiyan swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery AT yizhouhuang swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery

Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery

Similar Items