Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation

Compared with the traditional method based on hand-crafted features, deep neural network has achieved a certain degree of success on remote sensing (RS) image semantic segmentation. However, there are still serious holes, rough edge segmentation, and false detection or even missed detection due to t...

Full description

Bibliographic Details
Main Authors:	Youda Mo, Huihui Li, Xiangling Xiao, Huimin Zhao, Xiaoyong Liu, Jin Zhan
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Global local transformer block (GLTB) remote sensing (RS) image semantic segmentation Swin transformer Swin-Conv-Dspp (SCD)
Online Access:	https://ieeexplore.ieee.org/document/10137390/

_version_	1797800011668389888
author	Youda Mo Huihui Li Xiangling Xiao Huimin Zhao Xiaoyong Liu Jin Zhan
author_facet	Youda Mo Huihui Li Xiangling Xiao Huimin Zhao Xiaoyong Liu Jin Zhan
author_sort	Youda Mo
collection	DOAJ
description	Compared with the traditional method based on hand-crafted features, deep neural network has achieved a certain degree of success on remote sensing (RS) image semantic segmentation. However, there are still serious holes, rough edge segmentation, and false detection or even missed detection due to the light and its shadow in the segmentation. Aiming at the above problems, this article proposes a RS semantic segmentation model SCG-TransNet that is a hybrid model of Swin transformer and Deeplabv3+, which includes Swin-Conv-Dspp (SCD) and global local transformer block (GLTB). First, the SCD module which can efficiently extract feature information from objects at different scales is used to mitigate the hole phenomenon, reducing the loss of detailed information. Second, we construct a GLTB with spatial pyramid pooling shuffle module to extract critical detail information from the limited visible pixels of the occluded objects, which alleviates the problem of difficult object recognition due to occlusion effectively. Finally, the experimental results show that our SCG-TransNet achieves a mean intersection over union of 70.29<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> on the Vaihingen datasets, which is 3<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> higher than the baseline model. It also achieved good results on POSDAM datasets. These demonstrate the effectiveness, robustness, and superiority of our proposed method compared with existing state-of-the-art methods.
first_indexed	2024-03-13T04:27:37Z
format	Article
id	doaj.art-2862bffbee1149c6b879455e2b172836
institution	Directory Open Access Journal
issn	2151-1535
language	English
last_indexed	2024-03-13T04:27:37Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj.art-2862bffbee1149c6b879455e2b1728362023-06-19T23:00:25ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing2151-15352023-01-01165284529610.1109/JSTARS.2023.328036510137390Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic SegmentationYouda Mo0https://orcid.org/0009-0001-7573-6667Huihui Li1https://orcid.org/0000-0003-0463-8178Xiangling Xiao2https://orcid.org/0000-0001-6226-5459Huimin Zhao3https://orcid.org/0000-0002-6877-2002Xiaoyong Liu4https://orcid.org/0000-0002-0795-841XJin Zhan5https://orcid.org/0000-0002-7070-7031School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, ChinaSchool of Computer Science and Guangdong Provincial Key Laboratory of Intellectual Property and Big Data, Guangdong Polytechnic Normal University, Guangzhou, ChinaSchool of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, ChinaSchool of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, ChinaSchool of Data Science and Engineering, Guangdong Polytechnic Normal University, Guangzhou, ChinaSchool of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, ChinaCompared with the traditional method based on hand-crafted features, deep neural network has achieved a certain degree of success on remote sensing (RS) image semantic segmentation. However, there are still serious holes, rough edge segmentation, and false detection or even missed detection due to the light and its shadow in the segmentation. Aiming at the above problems, this article proposes a RS semantic segmentation model SCG-TransNet that is a hybrid model of Swin transformer and Deeplabv3+, which includes Swin-Conv-Dspp (SCD) and global local transformer block (GLTB). First, the SCD module which can efficiently extract feature information from objects at different scales is used to mitigate the hole phenomenon, reducing the loss of detailed information. Second, we construct a GLTB with spatial pyramid pooling shuffle module to extract critical detail information from the limited visible pixels of the occluded objects, which alleviates the problem of difficult object recognition due to occlusion effectively. Finally, the experimental results show that our SCG-TransNet achieves a mean intersection over union of 70.29<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> on the Vaihingen datasets, which is 3<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> higher than the baseline model. It also achieved good results on POSDAM datasets. These demonstrate the effectiveness, robustness, and superiority of our proposed method compared with existing state-of-the-art methods.https://ieeexplore.ieee.org/document/10137390/Global local transformer block (GLTB)remote sensing (RS) imagesemantic segmentationSwin transformerSwin-Conv-Dspp (SCD)
spellingShingle	Youda Mo Huihui Li Xiangling Xiao Huimin Zhao Xiaoyong Liu Jin Zhan Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Global local transformer block (GLTB) remote sensing (RS) image semantic segmentation Swin transformer Swin-Conv-Dspp (SCD)
title	Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation
title_full	Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation
title_fullStr	Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation
title_full_unstemmed	Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation
title_short	Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation
title_sort	swin conv dspp and global local transformer for remote sensing image semantic segmentation
topic	Global local transformer block (GLTB) remote sensing (RS) image semantic segmentation Swin transformer Swin-Conv-Dspp (SCD)
url	https://ieeexplore.ieee.org/document/10137390/
work_keys_str_mv	AT youdamo swinconvdsppandgloballocaltransformerforremotesensingimagesemanticsegmentation AT huihuili swinconvdsppandgloballocaltransformerforremotesensingimagesemanticsegmentation AT xianglingxiao swinconvdsppandgloballocaltransformerforremotesensingimagesemanticsegmentation AT huiminzhao swinconvdsppandgloballocaltransformerforremotesensingimagesemanticsegmentation AT xiaoyongliu swinconvdsppandgloballocaltransformerforremotesensingimagesemanticsegmentation AT jinzhan swinconvdsppandgloballocaltransformerforremotesensingimagesemanticsegmentation

Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation

Similar Items