Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery
The acquisition of global context and boundary information is crucial for the semantic segmentation of remote sensing (RS) images. In contrast to convolutional neural networks (CNNs), transformers exhibit superior performance in global modeling and shape feature encoding, which provides a novel aven...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10294282/ |
_version_ | 1797437687169286144 |
---|---|
author | Xuanyu Zhou Lifan Zhou Shengrong Gong Shan Zhong Wei Yan Yizhou Huang |
author_facet | Xuanyu Zhou Lifan Zhou Shengrong Gong Shan Zhong Wei Yan Yizhou Huang |
author_sort | Xuanyu Zhou |
collection | DOAJ |
description | The acquisition of global context and boundary information is crucial for the semantic segmentation of remote sensing (RS) images. In contrast to convolutional neural networks (CNNs), transformers exhibit superior performance in global modeling and shape feature encoding, which provides a novel avenue for obtaining global context and boundary information. However, current methods fail to effectively leverage these distinctive advantages of transformers. To address this issue, we propose a novel single encoder and dual decoders architecture called STDSNet, which embeds the Swin transformer into the dual-stream network for semantic segmentation of RS imagery. The proposed STDSNet employs the Swin transformer as the network backbone in the encoder to address the limitations of CNNs in global modeling and encoding shape features. The dual decoder comprises two parallel streams, namely the global stream (GS) and the shape stream (SS). The GS utilizes the global context fusion module (GCFM) to address the loss of global context during upsampling. It further integrates GCFMs with skip connections and a multiscale fusion strategy to mitigate large-scale regional object classification errors resulting from similar features or shadow occlusion in RS images. The SS introduces the gate convolution module (GCM) to filter out irrelevant features, allowing it to focus on processing boundary information, which improves the semantic segmentation performance of small targets and their boundaries in RS images. Extensive experiments demonstrate that STDSNet outperforms other state-of-the-art methods on the ISPRS Vaihingen and Potsdam benchmarks. |
first_indexed | 2024-03-09T11:26:12Z |
format | Article |
id | doaj.art-0ea6dd2779db466fb3c11f5ec7744423 |
institution | Directory Open Access Journal |
issn | 2151-1535 |
language | English |
last_indexed | 2024-03-09T11:26:12Z |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
spelling | doaj.art-0ea6dd2779db466fb3c11f5ec77444232023-12-01T00:00:34ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing2151-15352024-01-011717518910.1109/JSTARS.2023.332696710294282Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing ImageryXuanyu Zhou0https://orcid.org/0009-0002-0633-694XLifan Zhou1https://orcid.org/0000-0001-7665-413XShengrong Gong2https://orcid.org/0000-0003-0266-2422Shan Zhong3https://orcid.org/0000-0003-0034-6952Wei Yan4https://orcid.org/0000-0002-3412-4015Yizhou Huang5https://orcid.org/0009-0000-1584-233XSchool of Computer and Information Technology, Northeast Petroleum University, Daqing, ChinaSchool of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, ChinaSchool of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, ChinaSchool of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, ChinaSchool of Computer Science and Engineering, Changshu Institute of Technology, Suzhou, ChinaSchool of Electrical and Automation Engineering, Changshu Institute of Technology, Suzhou, ChinaThe acquisition of global context and boundary information is crucial for the semantic segmentation of remote sensing (RS) images. In contrast to convolutional neural networks (CNNs), transformers exhibit superior performance in global modeling and shape feature encoding, which provides a novel avenue for obtaining global context and boundary information. However, current methods fail to effectively leverage these distinctive advantages of transformers. To address this issue, we propose a novel single encoder and dual decoders architecture called STDSNet, which embeds the Swin transformer into the dual-stream network for semantic segmentation of RS imagery. The proposed STDSNet employs the Swin transformer as the network backbone in the encoder to address the limitations of CNNs in global modeling and encoding shape features. The dual decoder comprises two parallel streams, namely the global stream (GS) and the shape stream (SS). The GS utilizes the global context fusion module (GCFM) to address the loss of global context during upsampling. It further integrates GCFMs with skip connections and a multiscale fusion strategy to mitigate large-scale regional object classification errors resulting from similar features or shadow occlusion in RS images. The SS introduces the gate convolution module (GCM) to filter out irrelevant features, allowing it to focus on processing boundary information, which improves the semantic segmentation performance of small targets and their boundaries in RS images. Extensive experiments demonstrate that STDSNet outperforms other state-of-the-art methods on the ISPRS Vaihingen and Potsdam benchmarks.https://ieeexplore.ieee.org/document/10294282/Dual-streamremote sensing (RS)semantic segmentationSwin transformer |
spellingShingle | Xuanyu Zhou Lifan Zhou Shengrong Gong Shan Zhong Wei Yan Yizhou Huang Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Dual-stream remote sensing (RS) semantic segmentation Swin transformer |
title | Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery |
title_full | Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery |
title_fullStr | Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery |
title_full_unstemmed | Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery |
title_short | Swin Transformer Embedding Dual-Stream for Semantic Segmentation of Remote Sensing Imagery |
title_sort | swin transformer embedding dual stream for semantic segmentation of remote sensing imagery |
topic | Dual-stream remote sensing (RS) semantic segmentation Swin transformer |
url | https://ieeexplore.ieee.org/document/10294282/ |
work_keys_str_mv | AT xuanyuzhou swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery AT lifanzhou swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery AT shengronggong swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery AT shanzhong swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery AT weiyan swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery AT yizhouhuang swintransformerembeddingdualstreamforsemanticsegmentationofremotesensingimagery |