Research on the Applicability of Transformer Model in Remote-Sensing Image Segmentation

Transformer models have achieved great results in the field of computer vision over the past 2 years, drawing attention from within the field of remote sensing. However, there are still relatively few studies on this model in the field of remote sensing. Which method is more suitable for remote-sens...

Full description

Bibliographic Details
Main Authors:	Minmin Yu, Fen Qin
Format:	Article
Language:	English
Published:	MDPI AG 2023-02-01
Series:	Applied Sciences
Subjects:	transformer multihead attention remote-sensing image segmentation deep learning SwinUner TransUnet
Online Access:	https://www.mdpi.com/2076-3417/13/4/2261

_version_	1797622568012742656
author	Minmin Yu Fen Qin
author_facet	Minmin Yu Fen Qin
author_sort	Minmin Yu
collection	DOAJ
description	Transformer models have achieved great results in the field of computer vision over the past 2 years, drawing attention from within the field of remote sensing. However, there are still relatively few studies on this model in the field of remote sensing. Which method is more suitable for remote-sensing segmentation? In particular, how do different transformer models perform in the face of high-spatial resolution and the multispectral resolution of remote-sensing images? To explore these questions, this paper presents a comprehensive comparative analysis of three mainstream transformer models, including the segmentation transformer (SETRnet), SwinUnet, and TransUnet, by evaluating three aspects: a visual analysis of feature-segmentation results, accuracy, and training time. The experimental results show that the transformer structure has obvious advantages for the feature-extraction ability of large-scale remote-sensing data sets and ground objects, but the segmentation performance of different transfer structures in different scales of remote-sensing data sets is also very different. SwinUnet exhibits better global semantic interaction and pixel-level segmentation prediction on the large-scale Potsdam data set, and the SwinUnet model has the highest accuracy metrics for KAPPA, MIoU, and OA in the Potsdam data set, at 76.47%, 63.62%, and 85.01%, respectively. TransUnet has better segmentation results in the small-scale Vaihingen data set, and the three accuracy metrics of KAPPA, MIoU, and OA are the highest, at 80.54%, 56.25%, and 85.55%, respectively. TransUnet is better able to handle the edges and details of feature segmentation thanks to the network structure together built by its transformer and convolutional neural networks (CNNs). Therefore, TransUnet segmentation accuracy is higher when using a small-scale Vaihingen data set. Compared with SwinUnet and TransUnet, the segmentation performance of SETRnet in different scales of remote-sensing data sets is not ideal, so SETRnet is not suitable for the research task of remote-sensing image segmentation. In addition, this paper discusses the reasons for the performance differences between transformer models and discusses the differences between transformer models and CNN. This study further promotes the application of transformer models in remote-sensing image segmentation, improves the understanding of transformer models, and helps relevant researchers to select a more appropriate transformer model or model improvement method for remote-sensing image segmentation.
first_indexed	2024-03-11T09:13:10Z
format	Article
id	doaj.art-871a279fc4d54c858f206efeac2c01b3
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T09:13:10Z
publishDate	2023-02-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-871a279fc4d54c858f206efeac2c01b32023-11-16T18:53:13ZengMDPI AGApplied Sciences2076-34172023-02-01134226110.3390/app13042261Research on the Applicability of Transformer Model in Remote-Sensing Image SegmentationMinmin Yu0Fen Qin1The College of Geography and Environment Science, Henan University, Kaifeng 475004, ChinaThe College of Geography and Environment Science, Henan University, Kaifeng 475004, ChinaTransformer models have achieved great results in the field of computer vision over the past 2 years, drawing attention from within the field of remote sensing. However, there are still relatively few studies on this model in the field of remote sensing. Which method is more suitable for remote-sensing segmentation? In particular, how do different transformer models perform in the face of high-spatial resolution and the multispectral resolution of remote-sensing images? To explore these questions, this paper presents a comprehensive comparative analysis of three mainstream transformer models, including the segmentation transformer (SETRnet), SwinUnet, and TransUnet, by evaluating three aspects: a visual analysis of feature-segmentation results, accuracy, and training time. The experimental results show that the transformer structure has obvious advantages for the feature-extraction ability of large-scale remote-sensing data sets and ground objects, but the segmentation performance of different transfer structures in different scales of remote-sensing data sets is also very different. SwinUnet exhibits better global semantic interaction and pixel-level segmentation prediction on the large-scale Potsdam data set, and the SwinUnet model has the highest accuracy metrics for KAPPA, MIoU, and OA in the Potsdam data set, at 76.47%, 63.62%, and 85.01%, respectively. TransUnet has better segmentation results in the small-scale Vaihingen data set, and the three accuracy metrics of KAPPA, MIoU, and OA are the highest, at 80.54%, 56.25%, and 85.55%, respectively. TransUnet is better able to handle the edges and details of feature segmentation thanks to the network structure together built by its transformer and convolutional neural networks (CNNs). Therefore, TransUnet segmentation accuracy is higher when using a small-scale Vaihingen data set. Compared with SwinUnet and TransUnet, the segmentation performance of SETRnet in different scales of remote-sensing data sets is not ideal, so SETRnet is not suitable for the research task of remote-sensing image segmentation. In addition, this paper discusses the reasons for the performance differences between transformer models and discusses the differences between transformer models and CNN. This study further promotes the application of transformer models in remote-sensing image segmentation, improves the understanding of transformer models, and helps relevant researchers to select a more appropriate transformer model or model improvement method for remote-sensing image segmentation.https://www.mdpi.com/2076-3417/13/4/2261transformermultihead attentionremote-sensing image segmentationdeep learningSwinUnerTransUnet
spellingShingle	Minmin Yu Fen Qin Research on the Applicability of Transformer Model in Remote-Sensing Image Segmentation Applied Sciences transformer multihead attention remote-sensing image segmentation deep learning SwinUner TransUnet
title	Research on the Applicability of Transformer Model in Remote-Sensing Image Segmentation
title_full	Research on the Applicability of Transformer Model in Remote-Sensing Image Segmentation
title_fullStr	Research on the Applicability of Transformer Model in Remote-Sensing Image Segmentation
title_full_unstemmed	Research on the Applicability of Transformer Model in Remote-Sensing Image Segmentation
title_short	Research on the Applicability of Transformer Model in Remote-Sensing Image Segmentation
title_sort	research on the applicability of transformer model in remote sensing image segmentation
topic	transformer multihead attention remote-sensing image segmentation deep learning SwinUner TransUnet
url	https://www.mdpi.com/2076-3417/13/4/2261
work_keys_str_mv	AT minminyu researchontheapplicabilityoftransformermodelinremotesensingimagesegmentation AT fenqin researchontheapplicabilityoftransformermodelinremotesensingimagesegmentation

Research on the Applicability of Transformer Model in Remote-Sensing Image Segmentation

Similar Items