ACTNet: A Dual-Attention Adapter with a CNN-Transformer Network for the Semantic Segmentation of Remote Sensing Imagery

In recent years, the application of semantic segmentation methods based on the remote sensing of images has become increasingly prevalent across a diverse range of domains, including but not limited to forest detection, water body detection, urban rail transportation planning, and building extractio...

Full description

Bibliographic Details
Main Authors: Zheng Zhang, Fanchen Liu, Changan Liu, Qing Tian, Hongquan Qu
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/9/2363
_version_ 1827742672336453632
author Zheng Zhang
Fanchen Liu
Changan Liu
Qing Tian
Hongquan Qu
author_facet Zheng Zhang
Fanchen Liu
Changan Liu
Qing Tian
Hongquan Qu
author_sort Zheng Zhang
collection DOAJ
description In recent years, the application of semantic segmentation methods based on the remote sensing of images has become increasingly prevalent across a diverse range of domains, including but not limited to forest detection, water body detection, urban rail transportation planning, and building extraction. With the incorporation of the Transformer model into computer vision, the efficacy and accuracy of these algorithms have been significantly enhanced. Nevertheless, the Transformer model’s high computational complexity and dependence on a pre-training weight of large datasets leads to a slow convergence during the training for remote sensing segmentation tasks. Motivated by the success of the adapter module in the field of natural language processing, this paper presents a novel adapter module (ResAttn) for improving the model training speed for remote sensing segmentation. The ResAttn adopts a dual-attention structure in order to capture the interdependencies between sets of features, thereby improving its global modeling capabilities, and introduces a Swin Transformer-like down-sampling method to reduce information loss and retain the original architecture while reducing the resolution. In addition, the existing Transformer model is limited in its ability to capture local high-frequency information, which can lead to an inadequate extraction of edge and texture features. To address these issues, this paper proposes a Local Feature Extractor (LFE) module, which is based on a convolutional neural network (CNN), and incorporates multi-scale feature extraction and residual structure to effectively overcome this limitation. Further, a mask-based segmentation method is employed and a residual-enhanced deformable attention block (Deformer Block) is incorporated to improve the small target segmentation accuracy. Finally, a sufficient number of experiments were performed on the ISPRS Potsdam datasets. The experimental results demonstrate the superior performance of the model described in this paper.
first_indexed 2024-03-11T04:08:32Z
format Article
id doaj.art-ea25322810fc407097309a7fe8b016e1
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-11T04:08:32Z
publishDate 2023-04-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-ea25322810fc407097309a7fe8b016e12023-11-17T23:39:07ZengMDPI AGRemote Sensing2072-42922023-04-01159236310.3390/rs15092363ACTNet: A Dual-Attention Adapter with a CNN-Transformer Network for the Semantic Segmentation of Remote Sensing ImageryZheng Zhang0Fanchen Liu1Changan Liu2Qing Tian3Hongquan Qu4School of Information, North China University of Technology, Beijing 100144, ChinaSchool of Information, North China University of Technology, Beijing 100144, ChinaSchool of Information, North China University of Technology, Beijing 100144, ChinaSchool of Information, North China University of Technology, Beijing 100144, ChinaSchool of Information, North China University of Technology, Beijing 100144, ChinaIn recent years, the application of semantic segmentation methods based on the remote sensing of images has become increasingly prevalent across a diverse range of domains, including but not limited to forest detection, water body detection, urban rail transportation planning, and building extraction. With the incorporation of the Transformer model into computer vision, the efficacy and accuracy of these algorithms have been significantly enhanced. Nevertheless, the Transformer model’s high computational complexity and dependence on a pre-training weight of large datasets leads to a slow convergence during the training for remote sensing segmentation tasks. Motivated by the success of the adapter module in the field of natural language processing, this paper presents a novel adapter module (ResAttn) for improving the model training speed for remote sensing segmentation. The ResAttn adopts a dual-attention structure in order to capture the interdependencies between sets of features, thereby improving its global modeling capabilities, and introduces a Swin Transformer-like down-sampling method to reduce information loss and retain the original architecture while reducing the resolution. In addition, the existing Transformer model is limited in its ability to capture local high-frequency information, which can lead to an inadequate extraction of edge and texture features. To address these issues, this paper proposes a Local Feature Extractor (LFE) module, which is based on a convolutional neural network (CNN), and incorporates multi-scale feature extraction and residual structure to effectively overcome this limitation. Further, a mask-based segmentation method is employed and a residual-enhanced deformable attention block (Deformer Block) is incorporated to improve the small target segmentation accuracy. Finally, a sufficient number of experiments were performed on the ISPRS Potsdam datasets. The experimental results demonstrate the superior performance of the model described in this paper.https://www.mdpi.com/2072-4292/15/9/2363remote sensingsemantic segmentationtransformeradapter
spellingShingle Zheng Zhang
Fanchen Liu
Changan Liu
Qing Tian
Hongquan Qu
ACTNet: A Dual-Attention Adapter with a CNN-Transformer Network for the Semantic Segmentation of Remote Sensing Imagery
Remote Sensing
remote sensing
semantic segmentation
transformer
adapter
title ACTNet: A Dual-Attention Adapter with a CNN-Transformer Network for the Semantic Segmentation of Remote Sensing Imagery
title_full ACTNet: A Dual-Attention Adapter with a CNN-Transformer Network for the Semantic Segmentation of Remote Sensing Imagery
title_fullStr ACTNet: A Dual-Attention Adapter with a CNN-Transformer Network for the Semantic Segmentation of Remote Sensing Imagery
title_full_unstemmed ACTNet: A Dual-Attention Adapter with a CNN-Transformer Network for the Semantic Segmentation of Remote Sensing Imagery
title_short ACTNet: A Dual-Attention Adapter with a CNN-Transformer Network for the Semantic Segmentation of Remote Sensing Imagery
title_sort actnet a dual attention adapter with a cnn transformer network for the semantic segmentation of remote sensing imagery
topic remote sensing
semantic segmentation
transformer
adapter
url https://www.mdpi.com/2072-4292/15/9/2363
work_keys_str_mv AT zhengzhang actnetadualattentionadapterwithacnntransformernetworkforthesemanticsegmentationofremotesensingimagery
AT fanchenliu actnetadualattentionadapterwithacnntransformernetworkforthesemanticsegmentationofremotesensingimagery
AT changanliu actnetadualattentionadapterwithacnntransformernetworkforthesemanticsegmentationofremotesensingimagery
AT qingtian actnetadualattentionadapterwithacnntransformernetworkforthesemanticsegmentationofremotesensingimagery
AT hongquanqu actnetadualattentionadapterwithacnntransformernetworkforthesemanticsegmentationofremotesensingimagery