Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion

Cloud detection is a key step in optical remote sensing image processing, and the cloud-free image is of great significance for land use classification, change detection, and long time-series landcover monitoring. Traditional cloud detection methods based on spectral and texture features have acquir...

Full description

Bibliographic Details
Main Authors: Weihua Pu, Zhipan Wang, Di Liu, Qingling Zhang
Format: Article
Language:English
Published: MDPI AG 2022-09-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/14/17/4312
_version_ 1797493331351044096
author Weihua Pu
Zhipan Wang
Di Liu
Qingling Zhang
author_facet Weihua Pu
Zhipan Wang
Di Liu
Qingling Zhang
author_sort Weihua Pu
collection DOAJ
description Cloud detection is a key step in optical remote sensing image processing, and the cloud-free image is of great significance for land use classification, change detection, and long time-series landcover monitoring. Traditional cloud detection methods based on spectral and texture features have acquired certain effects in complex scenarios, such as cloud–snow mixing, but there is still a large room for improvement in terms of generation ability. In recent years, cloud detection with deep-learning methods has significantly improved the accuracy in complex regions such as high-brightness feature mixing areas. However, the existing deep learning-based cloud detection methods still have certain limitations. For instance, a few omission alarms and commission alarms still exist in cloud edge regions. At present, the cloud detection methods based on deep learning are gradually converted from a pure convolutional structure to a global feature extraction perspective, such as attention modules, but the computational burden is also increased, which is difficult to meet for the rapidly developing time-sensitive tasks, such as onboard real-time cloud detection in optical remote sensing imagery. To address the above problems, this manuscript proposes a high-precision cloud detection network fusing a self-attention module and spatial pyramidal pooling. Firstly, we use the DenseNet network as the backbone, then the deep semantic features are extracted by combining a global self-attention module and spatial pyramid pooling module. Secondly, to solve the problem of unbalanced training samples, we design a weighted cross-entropy loss function to optimize it. Finally, cloud detection accuracy is assessed. With the quantitative comparison experiments on different images, such as Landsat8, Landsat9, GF-2, and Beijing-2, the results indicate that, compared with the feature-based methods, the deep learning network can effectively distinguish in the cloud–snow confusion-prone region using only visible three-channel images, which significantly reduces the number of required image bands. Compared with other deep learning methods, the accuracy at the edge of the cloud region is higher and the overall computational efficiency is relatively optimal.
first_indexed 2024-03-10T01:18:30Z
format Article
id doaj.art-d6beb90e9e0e4982bc919901d7e6dc03
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-10T01:18:30Z
publishDate 2022-09-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-d6beb90e9e0e4982bc919901d7e6dc032023-11-23T14:04:34ZengMDPI AGRemote Sensing2072-42922022-09-011417431210.3390/rs14174312Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling FusionWeihua Pu0Zhipan Wang1Di Liu2Qingling Zhang3Shenzhen Aerospace Dongfanghong Satellite Co., Ltd., Shenzhen 518061, ChinaSchool of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen Campus, Shenzhen 518100, ChinaSchool of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen Campus, Shenzhen 518100, ChinaSchool of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen Campus, Shenzhen 518100, ChinaCloud detection is a key step in optical remote sensing image processing, and the cloud-free image is of great significance for land use classification, change detection, and long time-series landcover monitoring. Traditional cloud detection methods based on spectral and texture features have acquired certain effects in complex scenarios, such as cloud–snow mixing, but there is still a large room for improvement in terms of generation ability. In recent years, cloud detection with deep-learning methods has significantly improved the accuracy in complex regions such as high-brightness feature mixing areas. However, the existing deep learning-based cloud detection methods still have certain limitations. For instance, a few omission alarms and commission alarms still exist in cloud edge regions. At present, the cloud detection methods based on deep learning are gradually converted from a pure convolutional structure to a global feature extraction perspective, such as attention modules, but the computational burden is also increased, which is difficult to meet for the rapidly developing time-sensitive tasks, such as onboard real-time cloud detection in optical remote sensing imagery. To address the above problems, this manuscript proposes a high-precision cloud detection network fusing a self-attention module and spatial pyramidal pooling. Firstly, we use the DenseNet network as the backbone, then the deep semantic features are extracted by combining a global self-attention module and spatial pyramid pooling module. Secondly, to solve the problem of unbalanced training samples, we design a weighted cross-entropy loss function to optimize it. Finally, cloud detection accuracy is assessed. With the quantitative comparison experiments on different images, such as Landsat8, Landsat9, GF-2, and Beijing-2, the results indicate that, compared with the feature-based methods, the deep learning network can effectively distinguish in the cloud–snow confusion-prone region using only visible three-channel images, which significantly reduces the number of required image bands. Compared with other deep learning methods, the accuracy at the edge of the cloud region is higher and the overall computational efficiency is relatively optimal.https://www.mdpi.com/2072-4292/14/17/4312cloud detectionself-attentionpyramid pooling modulesemantic segmentationoptical remote sensing image
spellingShingle Weihua Pu
Zhipan Wang
Di Liu
Qingling Zhang
Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion
Remote Sensing
cloud detection
self-attention
pyramid pooling module
semantic segmentation
optical remote sensing image
title Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion
title_full Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion
title_fullStr Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion
title_full_unstemmed Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion
title_short Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion
title_sort optical remote sensing image cloud detection with self attention and spatial pyramid pooling fusion
topic cloud detection
self-attention
pyramid pooling module
semantic segmentation
optical remote sensing image
url https://www.mdpi.com/2072-4292/14/17/4312
work_keys_str_mv AT weihuapu opticalremotesensingimageclouddetectionwithselfattentionandspatialpyramidpoolingfusion
AT zhipanwang opticalremotesensingimageclouddetectionwithselfattentionandspatialpyramidpoolingfusion
AT diliu opticalremotesensingimageclouddetectionwithselfattentionandspatialpyramidpoolingfusion
AT qinglingzhang opticalremotesensingimageclouddetectionwithselfattentionandspatialpyramidpoolingfusion