RCCT-ASPPNet: Dual-Encoder Remote Image Segmentation Based on Transformer and ASPP

Remote image semantic segmentation technology is one of the core research elements in the field of computer vision and has a wide range of applications in production life. Most remote image semantic segmentation methods are based on CNN. Recently, Transformer provided a view of long-distance depende...

Full description

Bibliographic Details
Main Authors: Yazhou Li, Zhiyou Cheng, Chuanjian Wang, Jinling Zhao, Linsheng Huang
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/2/379
Description
Summary:Remote image semantic segmentation technology is one of the core research elements in the field of computer vision and has a wide range of applications in production life. Most remote image semantic segmentation methods are based on CNN. Recently, Transformer provided a view of long-distance dependencies in images. In this paper, we propose RCCT-ASPPNet, which includes the dual-encoder structure of Residual Multiscale Channel Cross-Fusion with Transformer (RCCT) and Atrous Spatial Pyramid Pooling (ASPP). RCCT uses Transformer to cross fuse global multiscale semantic information; the residual structure is then used to connect the inputs and outputs. ASPP based on CNN extracts contextual information of high-level semantics from different perspectives and uses Convolutional Block Attention Module (CBAM) to extract spatial and channel information, which will further improve the model segmentation ability. The experimental results show that the mIoU of our method is 94.14% and 61.30% on the datasets Farmland and AeroScapes, respectively, and that the mPA is 97.12% and 84.36%, respectively, both outperforming DeepLabV3+ and UCTransNet.
ISSN:2072-4292