Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method

Medium-resolution remote sensing satellites have provided a large amount of long time series and full coverage data for Earth surface monitoring. However, the different objects may have similar spectral values and the same objects may have different spectral values, which makes it difficult to impro...

Full description

Bibliographic Details
Main Authors: Junyuan Yao, Shuanggen Jin
Format: Article
Language:English
Published: MDPI AG 2022-07-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/14/14/3382
_version_ 1827626971770650624
author Junyuan Yao
Shuanggen Jin
author_facet Junyuan Yao
Shuanggen Jin
author_sort Junyuan Yao
collection DOAJ
description Medium-resolution remote sensing satellites have provided a large amount of long time series and full coverage data for Earth surface monitoring. However, the different objects may have similar spectral values and the same objects may have different spectral values, which makes it difficult to improve the classification accuracy. Semantic segmentation of remote sensing images is greatly facilitated via deep learning methods. For medium-resolution remote sensing images, the convolutional neural network-based model does not achieve good results due to its limited field of perception. The fast-emerging vision transformer method with self-attentively capturing global features well provides a new solution for medium-resolution remote sensing image segmentation. In this paper, a new multi-class segmentation method is proposed for medium-resolution remote sensing images based on the improved Swin UNet model as a pure transformer model and a new pre-processing, and the image enhancement method and spectral selection module are designed to achieve better accuracy. Finally, 10-categories segmentation is conducted with 10-m resolution Sentinel-2 MSI (Multi-Spectral Imager) images, which is compared with other traditional convolutional neural network-based models (DeepLabV3+ and U-Net with different backbone networks, including VGG, ResNet50, MobileNet, and Xception) with the same sample data, and results show higher Mean Intersection Over Union (MIOU) (72.06%) and better accuracy (89.77%) performance. The vision transformer method has great potential for medium-resolution remote sensing image segmentation tasks.
first_indexed 2024-03-09T13:06:00Z
format Article
id doaj.art-bb1c8409b7834952ad17ec19308c68f7
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-09T13:06:00Z
publishDate 2022-07-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-bb1c8409b7834952ad17ec19308c68f72023-11-30T21:49:11ZengMDPI AGRemote Sensing2072-42922022-07-011414338210.3390/rs14143382Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet MethodJunyuan Yao0Shuanggen Jin1School of Communication and Information Engineering, Shanghai University, Shanghai 200444, ChinaShanghai Astronomical Observatory, Chinese Academy of Sciences, Shanghai 200030, ChinaMedium-resolution remote sensing satellites have provided a large amount of long time series and full coverage data for Earth surface monitoring. However, the different objects may have similar spectral values and the same objects may have different spectral values, which makes it difficult to improve the classification accuracy. Semantic segmentation of remote sensing images is greatly facilitated via deep learning methods. For medium-resolution remote sensing images, the convolutional neural network-based model does not achieve good results due to its limited field of perception. The fast-emerging vision transformer method with self-attentively capturing global features well provides a new solution for medium-resolution remote sensing image segmentation. In this paper, a new multi-class segmentation method is proposed for medium-resolution remote sensing images based on the improved Swin UNet model as a pure transformer model and a new pre-processing, and the image enhancement method and spectral selection module are designed to achieve better accuracy. Finally, 10-categories segmentation is conducted with 10-m resolution Sentinel-2 MSI (Multi-Spectral Imager) images, which is compared with other traditional convolutional neural network-based models (DeepLabV3+ and U-Net with different backbone networks, including VGG, ResNet50, MobileNet, and Xception) with the same sample data, and results show higher Mean Intersection Over Union (MIOU) (72.06%) and better accuracy (89.77%) performance. The vision transformer method has great potential for medium-resolution remote sensing image segmentation tasks.https://www.mdpi.com/2072-4292/14/14/3382Swin UNetSwin Transformerremote sensingsemantic segmentationSentinel-2
spellingShingle Junyuan Yao
Shuanggen Jin
Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method
Remote Sensing
Swin UNet
Swin Transformer
remote sensing
semantic segmentation
Sentinel-2
title Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method
title_full Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method
title_fullStr Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method
title_full_unstemmed Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method
title_short Multi-Category Segmentation of Sentinel-2 Images Based on the Swin UNet Method
title_sort multi category segmentation of sentinel 2 images based on the swin unet method
topic Swin UNet
Swin Transformer
remote sensing
semantic segmentation
Sentinel-2
url https://www.mdpi.com/2072-4292/14/14/3382
work_keys_str_mv AT junyuanyao multicategorysegmentationofsentinel2imagesbasedontheswinunetmethod
AT shuanggenjin multicategorysegmentationofsentinel2imagesbasedontheswinunetmethod