MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer

The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can su...

Full description

Bibliographic Details
Main Authors: Wei Yuan, Wenbo Xu
Format: Article
Language:English
Published: MDPI AG 2021-11-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/13/23/4743
_version_ 1797507313991417856
author Wei Yuan
Wenbo Xu
author_facet Wei Yuan
Wenbo Xu
author_sort Wei Yuan
collection DOAJ
description The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation.
first_indexed 2024-03-10T04:46:47Z
format Article
id doaj.art-c5bcd25ad3f44bd78695709532f03b90
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-10T04:46:47Z
publishDate 2021-11-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-c5bcd25ad3f44bd78695709532f03b902023-11-23T02:55:42ZengMDPI AGRemote Sensing2072-42922021-11-011323474310.3390/rs13234743MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin TransformerWei Yuan0Wenbo Xu1School of Architecture and Civil Engineering, Chengdu University, Chengdu 610106, ChinaSchool of Resources and Environment, University of Electronic Science and Technology of China, Chengdu 611731, ChinaThe segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation.https://www.mdpi.com/2072-4292/13/23/4743deep learningremote sensingtransformersemantic segmentationmulti-scale adaptive
spellingShingle Wei Yuan
Wenbo Xu
MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer
Remote Sensing
deep learning
remote sensing
transformer
semantic segmentation
multi-scale adaptive
title MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer
title_full MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer
title_fullStr MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer
title_full_unstemmed MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer
title_short MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer
title_sort msst net a multi scale adaptive network for building extraction from remote sensing images based on swin transformer
topic deep learning
remote sensing
transformer
semantic segmentation
multi-scale adaptive
url https://www.mdpi.com/2072-4292/13/23/4743
work_keys_str_mv AT weiyuan msstnetamultiscaleadaptivenetworkforbuildingextractionfromremotesensingimagesbasedonswintransformer
AT wenboxu msstnetamultiscaleadaptivenetworkforbuildingextractionfromremotesensingimagesbasedonswintransformer