MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer
The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can su...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-11-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/13/23/4743 |
_version_ | 1797507313991417856 |
---|---|
author | Wei Yuan Wenbo Xu |
author_facet | Wei Yuan Wenbo Xu |
author_sort | Wei Yuan |
collection | DOAJ |
description | The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation. |
first_indexed | 2024-03-10T04:46:47Z |
format | Article |
id | doaj.art-c5bcd25ad3f44bd78695709532f03b90 |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-03-10T04:46:47Z |
publishDate | 2021-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-c5bcd25ad3f44bd78695709532f03b902023-11-23T02:55:42ZengMDPI AGRemote Sensing2072-42922021-11-011323474310.3390/rs13234743MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin TransformerWei Yuan0Wenbo Xu1School of Architecture and Civil Engineering, Chengdu University, Chengdu 610106, ChinaSchool of Resources and Environment, University of Electronic Science and Technology of China, Chengdu 611731, ChinaThe segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation.https://www.mdpi.com/2072-4292/13/23/4743deep learningremote sensingtransformersemantic segmentationmulti-scale adaptive |
spellingShingle | Wei Yuan Wenbo Xu MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer Remote Sensing deep learning remote sensing transformer semantic segmentation multi-scale adaptive |
title | MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer |
title_full | MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer |
title_fullStr | MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer |
title_full_unstemmed | MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer |
title_short | MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer |
title_sort | msst net a multi scale adaptive network for building extraction from remote sensing images based on swin transformer |
topic | deep learning remote sensing transformer semantic segmentation multi-scale adaptive |
url | https://www.mdpi.com/2072-4292/13/23/4743 |
work_keys_str_mv | AT weiyuan msstnetamultiscaleadaptivenetworkforbuildingextractionfromremotesensingimagesbasedonswintransformer AT wenboxu msstnetamultiscaleadaptivenetworkforbuildingextractionfromremotesensingimagesbasedonswintransformer |