Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion
Cloud detection is a key step in optical remote sensing image processing, and the cloud-free image is of great significance for land use classification, change detection, and long time-series landcover monitoring. Traditional cloud detection methods based on spectral and texture features have acquir...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-09-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/14/17/4312 |
_version_ | 1797493331351044096 |
---|---|
author | Weihua Pu Zhipan Wang Di Liu Qingling Zhang |
author_facet | Weihua Pu Zhipan Wang Di Liu Qingling Zhang |
author_sort | Weihua Pu |
collection | DOAJ |
description | Cloud detection is a key step in optical remote sensing image processing, and the cloud-free image is of great significance for land use classification, change detection, and long time-series landcover monitoring. Traditional cloud detection methods based on spectral and texture features have acquired certain effects in complex scenarios, such as cloud–snow mixing, but there is still a large room for improvement in terms of generation ability. In recent years, cloud detection with deep-learning methods has significantly improved the accuracy in complex regions such as high-brightness feature mixing areas. However, the existing deep learning-based cloud detection methods still have certain limitations. For instance, a few omission alarms and commission alarms still exist in cloud edge regions. At present, the cloud detection methods based on deep learning are gradually converted from a pure convolutional structure to a global feature extraction perspective, such as attention modules, but the computational burden is also increased, which is difficult to meet for the rapidly developing time-sensitive tasks, such as onboard real-time cloud detection in optical remote sensing imagery. To address the above problems, this manuscript proposes a high-precision cloud detection network fusing a self-attention module and spatial pyramidal pooling. Firstly, we use the DenseNet network as the backbone, then the deep semantic features are extracted by combining a global self-attention module and spatial pyramid pooling module. Secondly, to solve the problem of unbalanced training samples, we design a weighted cross-entropy loss function to optimize it. Finally, cloud detection accuracy is assessed. With the quantitative comparison experiments on different images, such as Landsat8, Landsat9, GF-2, and Beijing-2, the results indicate that, compared with the feature-based methods, the deep learning network can effectively distinguish in the cloud–snow confusion-prone region using only visible three-channel images, which significantly reduces the number of required image bands. Compared with other deep learning methods, the accuracy at the edge of the cloud region is higher and the overall computational efficiency is relatively optimal. |
first_indexed | 2024-03-10T01:18:30Z |
format | Article |
id | doaj.art-d6beb90e9e0e4982bc919901d7e6dc03 |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-03-10T01:18:30Z |
publishDate | 2022-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-d6beb90e9e0e4982bc919901d7e6dc032023-11-23T14:04:34ZengMDPI AGRemote Sensing2072-42922022-09-011417431210.3390/rs14174312Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling FusionWeihua Pu0Zhipan Wang1Di Liu2Qingling Zhang3Shenzhen Aerospace Dongfanghong Satellite Co., Ltd., Shenzhen 518061, ChinaSchool of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen Campus, Shenzhen 518100, ChinaSchool of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen Campus, Shenzhen 518100, ChinaSchool of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen Campus, Shenzhen 518100, ChinaCloud detection is a key step in optical remote sensing image processing, and the cloud-free image is of great significance for land use classification, change detection, and long time-series landcover monitoring. Traditional cloud detection methods based on spectral and texture features have acquired certain effects in complex scenarios, such as cloud–snow mixing, but there is still a large room for improvement in terms of generation ability. In recent years, cloud detection with deep-learning methods has significantly improved the accuracy in complex regions such as high-brightness feature mixing areas. However, the existing deep learning-based cloud detection methods still have certain limitations. For instance, a few omission alarms and commission alarms still exist in cloud edge regions. At present, the cloud detection methods based on deep learning are gradually converted from a pure convolutional structure to a global feature extraction perspective, such as attention modules, but the computational burden is also increased, which is difficult to meet for the rapidly developing time-sensitive tasks, such as onboard real-time cloud detection in optical remote sensing imagery. To address the above problems, this manuscript proposes a high-precision cloud detection network fusing a self-attention module and spatial pyramidal pooling. Firstly, we use the DenseNet network as the backbone, then the deep semantic features are extracted by combining a global self-attention module and spatial pyramid pooling module. Secondly, to solve the problem of unbalanced training samples, we design a weighted cross-entropy loss function to optimize it. Finally, cloud detection accuracy is assessed. With the quantitative comparison experiments on different images, such as Landsat8, Landsat9, GF-2, and Beijing-2, the results indicate that, compared with the feature-based methods, the deep learning network can effectively distinguish in the cloud–snow confusion-prone region using only visible three-channel images, which significantly reduces the number of required image bands. Compared with other deep learning methods, the accuracy at the edge of the cloud region is higher and the overall computational efficiency is relatively optimal.https://www.mdpi.com/2072-4292/14/17/4312cloud detectionself-attentionpyramid pooling modulesemantic segmentationoptical remote sensing image |
spellingShingle | Weihua Pu Zhipan Wang Di Liu Qingling Zhang Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion Remote Sensing cloud detection self-attention pyramid pooling module semantic segmentation optical remote sensing image |
title | Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion |
title_full | Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion |
title_fullStr | Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion |
title_full_unstemmed | Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion |
title_short | Optical Remote Sensing Image Cloud Detection with Self-Attention and Spatial Pyramid Pooling Fusion |
title_sort | optical remote sensing image cloud detection with self attention and spatial pyramid pooling fusion |
topic | cloud detection self-attention pyramid pooling module semantic segmentation optical remote sensing image |
url | https://www.mdpi.com/2072-4292/14/17/4312 |
work_keys_str_mv | AT weihuapu opticalremotesensingimageclouddetectionwithselfattentionandspatialpyramidpoolingfusion AT zhipanwang opticalremotesensingimageclouddetectionwithselfattentionandspatialpyramidpoolingfusion AT diliu opticalremotesensingimageclouddetectionwithselfattentionandspatialpyramidpoolingfusion AT qinglingzhang opticalremotesensingimageclouddetectionwithselfattentionandspatialpyramidpoolingfusion |