A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection

Cloud detection is critical in remote sensing image processing, and convolutional neural networks (CNNs) have significantly advanced this field. However, traditional CNNs primarily focus on extracting local features, which can be challenging for cloud detection due to the variability in the size, sh...

Full description

Bibliographic Details
Main Authors: Chengjuan Gong, Tengfei Long, Ranyu Yin, Weili Jiao, Guizhou Wang
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/21/5264
_version_ 1827765398283485184
author Chengjuan Gong
Tengfei Long
Ranyu Yin
Weili Jiao
Guizhou Wang
author_facet Chengjuan Gong
Tengfei Long
Ranyu Yin
Weili Jiao
Guizhou Wang
author_sort Chengjuan Gong
collection DOAJ
description Cloud detection is critical in remote sensing image processing, and convolutional neural networks (CNNs) have significantly advanced this field. However, traditional CNNs primarily focus on extracting local features, which can be challenging for cloud detection due to the variability in the size, shape, and boundaries of clouds. To address this limitation, we propose a hybrid Swin transformer–CNN cloud detection (STCCD) network that combines the strengths of both architectures. The STCCD network employs a novel dual-stream encoder that integrates Swin transformer and CNN blocks. Swin transformers can capture global context features more effectively than traditional CNNs, while CNNs excel at extracting local features. The two streams are fused via a fusion coupling module (FCM) to produce a richer representation of the input image. To further enhance the network’s ability in extracting cloud features, we incorporate a feature fusion module based on the attention mechanism (FFMAM) and an aggregation multiscale feature module (AMSFM). The FFMAM selectively merges global and local features based on their importance, while the AMSFM aggregates feature maps from different spatial scales to obtain a more comprehensive representation of the cloud mask. We evaluated the STCCD network on three challenging cloud detection datasets (GF1-WHU, SPARCS, and AIR-CD), as well as the L8-Biome dataset to assess its generalization capability. The results show that the STCCD network outperformed other state-of-the-art methods on all datasets. Notably, the STCCD model, trained on only four bands (visible and near-infrared) of the GF1-WHU dataset, outperformed the official Landsat-8 Fmask algorithm in the L8-Biome dataset, which uses additional bands (shortwave infrared, cirrus, and thermal).
first_indexed 2024-03-11T11:22:16Z
format Article
id doaj.art-62e5a11d283a4c6f84bc455281f3b05f
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-11T11:22:16Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-62e5a11d283a4c6f84bc455281f3b05f2023-11-10T15:11:34ZengMDPI AGRemote Sensing2072-42922023-11-011521526410.3390/rs15215264A Hybrid Algorithm with Swin Transformer and Convolution for Cloud DetectionChengjuan Gong0Tengfei Long1Ranyu Yin2Weili Jiao3Guizhou Wang4Aerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100094, ChinaAerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100094, ChinaAerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100094, ChinaAerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100094, ChinaAerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100094, ChinaCloud detection is critical in remote sensing image processing, and convolutional neural networks (CNNs) have significantly advanced this field. However, traditional CNNs primarily focus on extracting local features, which can be challenging for cloud detection due to the variability in the size, shape, and boundaries of clouds. To address this limitation, we propose a hybrid Swin transformer–CNN cloud detection (STCCD) network that combines the strengths of both architectures. The STCCD network employs a novel dual-stream encoder that integrates Swin transformer and CNN blocks. Swin transformers can capture global context features more effectively than traditional CNNs, while CNNs excel at extracting local features. The two streams are fused via a fusion coupling module (FCM) to produce a richer representation of the input image. To further enhance the network’s ability in extracting cloud features, we incorporate a feature fusion module based on the attention mechanism (FFMAM) and an aggregation multiscale feature module (AMSFM). The FFMAM selectively merges global and local features based on their importance, while the AMSFM aggregates feature maps from different spatial scales to obtain a more comprehensive representation of the cloud mask. We evaluated the STCCD network on three challenging cloud detection datasets (GF1-WHU, SPARCS, and AIR-CD), as well as the L8-Biome dataset to assess its generalization capability. The results show that the STCCD network outperformed other state-of-the-art methods on all datasets. Notably, the STCCD model, trained on only four bands (visible and near-infrared) of the GF1-WHU dataset, outperformed the official Landsat-8 Fmask algorithm in the L8-Biome dataset, which uses additional bands (shortwave infrared, cirrus, and thermal).https://www.mdpi.com/2072-4292/15/21/5264Swin transformercloud detectionimage segmentationattentionconvolution
spellingShingle Chengjuan Gong
Tengfei Long
Ranyu Yin
Weili Jiao
Guizhou Wang
A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection
Remote Sensing
Swin transformer
cloud detection
image segmentation
attention
convolution
title A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection
title_full A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection
title_fullStr A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection
title_full_unstemmed A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection
title_short A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection
title_sort hybrid algorithm with swin transformer and convolution for cloud detection
topic Swin transformer
cloud detection
image segmentation
attention
convolution
url https://www.mdpi.com/2072-4292/15/21/5264
work_keys_str_mv AT chengjuangong ahybridalgorithmwithswintransformerandconvolutionforclouddetection
AT tengfeilong ahybridalgorithmwithswintransformerandconvolutionforclouddetection
AT ranyuyin ahybridalgorithmwithswintransformerandconvolutionforclouddetection
AT weilijiao ahybridalgorithmwithswintransformerandconvolutionforclouddetection
AT guizhouwang ahybridalgorithmwithswintransformerandconvolutionforclouddetection
AT chengjuangong hybridalgorithmwithswintransformerandconvolutionforclouddetection
AT tengfeilong hybridalgorithmwithswintransformerandconvolutionforclouddetection
AT ranyuyin hybridalgorithmwithswintransformerandconvolutionforclouddetection
AT weilijiao hybridalgorithmwithswintransformerandconvolutionforclouddetection
AT guizhouwang hybridalgorithmwithswintransformerandconvolutionforclouddetection