A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection
Cloud detection is critical in remote sensing image processing, and convolutional neural networks (CNNs) have significantly advanced this field. However, traditional CNNs primarily focus on extracting local features, which can be challenging for cloud detection due to the variability in the size, sh...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-11-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/15/21/5264 |
_version_ | 1827765398283485184 |
---|---|
author | Chengjuan Gong Tengfei Long Ranyu Yin Weili Jiao Guizhou Wang |
author_facet | Chengjuan Gong Tengfei Long Ranyu Yin Weili Jiao Guizhou Wang |
author_sort | Chengjuan Gong |
collection | DOAJ |
description | Cloud detection is critical in remote sensing image processing, and convolutional neural networks (CNNs) have significantly advanced this field. However, traditional CNNs primarily focus on extracting local features, which can be challenging for cloud detection due to the variability in the size, shape, and boundaries of clouds. To address this limitation, we propose a hybrid Swin transformer–CNN cloud detection (STCCD) network that combines the strengths of both architectures. The STCCD network employs a novel dual-stream encoder that integrates Swin transformer and CNN blocks. Swin transformers can capture global context features more effectively than traditional CNNs, while CNNs excel at extracting local features. The two streams are fused via a fusion coupling module (FCM) to produce a richer representation of the input image. To further enhance the network’s ability in extracting cloud features, we incorporate a feature fusion module based on the attention mechanism (FFMAM) and an aggregation multiscale feature module (AMSFM). The FFMAM selectively merges global and local features based on their importance, while the AMSFM aggregates feature maps from different spatial scales to obtain a more comprehensive representation of the cloud mask. We evaluated the STCCD network on three challenging cloud detection datasets (GF1-WHU, SPARCS, and AIR-CD), as well as the L8-Biome dataset to assess its generalization capability. The results show that the STCCD network outperformed other state-of-the-art methods on all datasets. Notably, the STCCD model, trained on only four bands (visible and near-infrared) of the GF1-WHU dataset, outperformed the official Landsat-8 Fmask algorithm in the L8-Biome dataset, which uses additional bands (shortwave infrared, cirrus, and thermal). |
first_indexed | 2024-03-11T11:22:16Z |
format | Article |
id | doaj.art-62e5a11d283a4c6f84bc455281f3b05f |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-03-11T11:22:16Z |
publishDate | 2023-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-62e5a11d283a4c6f84bc455281f3b05f2023-11-10T15:11:34ZengMDPI AGRemote Sensing2072-42922023-11-011521526410.3390/rs15215264A Hybrid Algorithm with Swin Transformer and Convolution for Cloud DetectionChengjuan Gong0Tengfei Long1Ranyu Yin2Weili Jiao3Guizhou Wang4Aerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100094, ChinaAerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100094, ChinaAerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100094, ChinaAerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100094, ChinaAerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS), Beijing 100094, ChinaCloud detection is critical in remote sensing image processing, and convolutional neural networks (CNNs) have significantly advanced this field. However, traditional CNNs primarily focus on extracting local features, which can be challenging for cloud detection due to the variability in the size, shape, and boundaries of clouds. To address this limitation, we propose a hybrid Swin transformer–CNN cloud detection (STCCD) network that combines the strengths of both architectures. The STCCD network employs a novel dual-stream encoder that integrates Swin transformer and CNN blocks. Swin transformers can capture global context features more effectively than traditional CNNs, while CNNs excel at extracting local features. The two streams are fused via a fusion coupling module (FCM) to produce a richer representation of the input image. To further enhance the network’s ability in extracting cloud features, we incorporate a feature fusion module based on the attention mechanism (FFMAM) and an aggregation multiscale feature module (AMSFM). The FFMAM selectively merges global and local features based on their importance, while the AMSFM aggregates feature maps from different spatial scales to obtain a more comprehensive representation of the cloud mask. We evaluated the STCCD network on three challenging cloud detection datasets (GF1-WHU, SPARCS, and AIR-CD), as well as the L8-Biome dataset to assess its generalization capability. The results show that the STCCD network outperformed other state-of-the-art methods on all datasets. Notably, the STCCD model, trained on only four bands (visible and near-infrared) of the GF1-WHU dataset, outperformed the official Landsat-8 Fmask algorithm in the L8-Biome dataset, which uses additional bands (shortwave infrared, cirrus, and thermal).https://www.mdpi.com/2072-4292/15/21/5264Swin transformercloud detectionimage segmentationattentionconvolution |
spellingShingle | Chengjuan Gong Tengfei Long Ranyu Yin Weili Jiao Guizhou Wang A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection Remote Sensing Swin transformer cloud detection image segmentation attention convolution |
title | A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection |
title_full | A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection |
title_fullStr | A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection |
title_full_unstemmed | A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection |
title_short | A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection |
title_sort | hybrid algorithm with swin transformer and convolution for cloud detection |
topic | Swin transformer cloud detection image segmentation attention convolution |
url | https://www.mdpi.com/2072-4292/15/21/5264 |
work_keys_str_mv | AT chengjuangong ahybridalgorithmwithswintransformerandconvolutionforclouddetection AT tengfeilong ahybridalgorithmwithswintransformerandconvolutionforclouddetection AT ranyuyin ahybridalgorithmwithswintransformerandconvolutionforclouddetection AT weilijiao ahybridalgorithmwithswintransformerandconvolutionforclouddetection AT guizhouwang ahybridalgorithmwithswintransformerandconvolutionforclouddetection AT chengjuangong hybridalgorithmwithswintransformerandconvolutionforclouddetection AT tengfeilong hybridalgorithmwithswintransformerandconvolutionforclouddetection AT ranyuyin hybridalgorithmwithswintransformerandconvolutionforclouddetection AT weilijiao hybridalgorithmwithswintransformerandconvolutionforclouddetection AT guizhouwang hybridalgorithmwithswintransformerandconvolutionforclouddetection |