Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation

In the context of fast progress in deep learning, convolutional neural networks have been extensively applied to the semantic segmentation of remote sensing images and have achieved significant progress. However, certain limitations exist in capturing global contextual information due to the charact...

Full description

Bibliographic Details
Main Authors:	Yan Chen, Quan Dong, Xiaofeng Wang, Qianchuan Zhang, Menglei Kang, Wenxiang Jiang, Mengyuan Wang, Lixiang Xu, Chen Zhang
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Global cross fusion hybrid attention remote sensing image semantic segmentation Transformer
Online Access:	https://ieeexplore.ieee.org/document/10416333/

_version_	1827349058955509760
author	Yan Chen Quan Dong Xiaofeng Wang Qianchuan Zhang Menglei Kang Wenxiang Jiang Mengyuan Wang Lixiang Xu Chen Zhang
author_facet	Yan Chen Quan Dong Xiaofeng Wang Qianchuan Zhang Menglei Kang Wenxiang Jiang Mengyuan Wang Lixiang Xu Chen Zhang
author_sort	Yan Chen
collection	DOAJ
description	In the context of fast progress in deep learning, convolutional neural networks have been extensively applied to the semantic segmentation of remote sensing images and have achieved significant progress. However, certain limitations exist in capturing global contextual information due to the characteristics of convolutional local properties. Recently, Transformer has become a focus of research in computer vision and has shown great potential in extracting global contextual information, further promoting the development of semantic segmentation tasks. In this article, we use ResNet50 as an encoder, embed the hybrid attention mechanism into Transformer, and propose a Transformer-based decoder. The Channel-Spatial Transformer Block further aggregates features by integrating the local feature maps extracted by the encoder with their associated global dependencies. At the same time, an adaptive approach is employed to reweight the interdependent channel maps to enhance the feature fusion. The global cross-fusion module combines the extracted complementary features to obtain more comprehensive semantic information. Extensive comparative experiments were conducted on the ISPRS Potsdam and Vaihingen datasets, where mIoU reached 78.06% and 76.37%, respectively. The outcomes of multiple ablation experiments also validate the effectiveness of the proposed method.
first_indexed	2024-03-08T00:24:41Z
format	Article
id	doaj.art-36cfce9bdedb437b8cffeb6136bd9eda
institution	Directory Open Access Journal
issn	2151-1535
language	English
last_indexed	2024-03-08T00:24:41Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj.art-36cfce9bdedb437b8cffeb6136bd9eda2024-02-16T00:00:32ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing2151-15352024-01-01174421443510.1109/JSTARS.2024.335885110416333Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic SegmentationYan Chen0https://orcid.org/0000-0001-9294-0128Quan Dong1https://orcid.org/0009-0002-2521-7081Xiaofeng Wang2https://orcid.org/0000-0001-7592-277XQianchuan Zhang3https://orcid.org/0009-0008-0937-2296Menglei Kang4https://orcid.org/0000-0002-1077-0813Wenxiang Jiang5https://orcid.org/0000-0001-9450-3415Mengyuan Wang6https://orcid.org/0000-0002-1337-6862Lixiang Xu7https://orcid.org/0000-0001-8946-620XChen Zhang8https://orcid.org/0000-0003-3870-8744School of Artificial Intelligence and Big Data, Hefei University, Hefei, ChinaSchool of Artificial Intelligence and Big Data, Hefei University, Hefei, ChinaSchool of Artificial Intelligence and Big Data, Hefei University, Hefei, ChinaSchool of Artificial Intelligence and Big Data, Hefei University, Hefei, ChinaSchool of Artificial Intelligence and Big Data, Hefei University, Hefei, ChinaSchool of Artificial Intelligence and Big Data, Hefei University, Hefei, ChinaSchool of Artificial Intelligence and Big Data, Hefei University, Hefei, ChinaSchool of Artificial Intelligence and Big Data, Hefei University, Hefei, ChinaSchool of Artificial Intelligence and Big Data, Hefei University, Hefei, ChinaIn the context of fast progress in deep learning, convolutional neural networks have been extensively applied to the semantic segmentation of remote sensing images and have achieved significant progress. However, certain limitations exist in capturing global contextual information due to the characteristics of convolutional local properties. Recently, Transformer has become a focus of research in computer vision and has shown great potential in extracting global contextual information, further promoting the development of semantic segmentation tasks. In this article, we use ResNet50 as an encoder, embed the hybrid attention mechanism into Transformer, and propose a Transformer-based decoder. The Channel-Spatial Transformer Block further aggregates features by integrating the local feature maps extracted by the encoder with their associated global dependencies. At the same time, an adaptive approach is employed to reweight the interdependent channel maps to enhance the feature fusion. The global cross-fusion module combines the extracted complementary features to obtain more comprehensive semantic information. Extensive comparative experiments were conducted on the ISPRS Potsdam and Vaihingen datasets, where mIoU reached 78.06% and 76.37%, respectively. The outcomes of multiple ablation experiments also validate the effectiveness of the proposed method.https://ieeexplore.ieee.org/document/10416333/Global cross fusionhybrid attentionremote sensing imagesemantic segmentationTransformer
spellingShingle	Yan Chen Quan Dong Xiaofeng Wang Qianchuan Zhang Menglei Kang Wenxiang Jiang Mengyuan Wang Lixiang Xu Chen Zhang Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Global cross fusion hybrid attention remote sensing image semantic segmentation Transformer
title	Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation
title_full	Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation
title_fullStr	Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation
title_full_unstemmed	Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation
title_short	Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation
title_sort	hybrid attention fusion embedded in transformer for remote sensing image semantic segmentation
topic	Global cross fusion hybrid attention remote sensing image semantic segmentation Transformer
url	https://ieeexplore.ieee.org/document/10416333/
work_keys_str_mv	AT yanchen hybridattentionfusionembeddedintransformerforremotesensingimagesemanticsegmentation AT quandong hybridattentionfusionembeddedintransformerforremotesensingimagesemanticsegmentation AT xiaofengwang hybridattentionfusionembeddedintransformerforremotesensingimagesemanticsegmentation AT qianchuanzhang hybridattentionfusionembeddedintransformerforremotesensingimagesemanticsegmentation AT mengleikang hybridattentionfusionembeddedintransformerforremotesensingimagesemanticsegmentation AT wenxiangjiang hybridattentionfusionembeddedintransformerforremotesensingimagesemanticsegmentation AT mengyuanwang hybridattentionfusionembeddedintransformerforremotesensingimagesemanticsegmentation AT lixiangxu hybridattentionfusionembeddedintransformerforremotesensingimagesemanticsegmentation AT chenzhang hybridattentionfusionembeddedintransformerforremotesensingimagesemanticsegmentation

Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation

Similar Items