A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing Images

Current deep learning-based change detection approaches mostly produce convincing results by introducing attention mechanisms to traditional convolutional networks. However, given the limitation of the receptive field, convolution-based methods fall short of fully modelling global context and captur...

Full description

Bibliographic Details
Main Authors: Mengmeng Yin, Zhibo Chen, Chengjian Zhang
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/9/2406
_version_ 1797601776174628864
author Mengmeng Yin
Zhibo Chen
Chengjian Zhang
author_facet Mengmeng Yin
Zhibo Chen
Chengjian Zhang
author_sort Mengmeng Yin
collection DOAJ
description Current deep learning-based change detection approaches mostly produce convincing results by introducing attention mechanisms to traditional convolutional networks. However, given the limitation of the receptive field, convolution-based methods fall short of fully modelling global context and capturing long-range dependencies, thus insufficient in discriminating pseudo changes. Transformers have an efficient global spatio-temporal modelling capability, which is beneficial for the feature representation of changes of interest. However, the lack of detailed information may cause the transformer to locate the boundaries of changed regions inaccurately. Therefore, in this article, a hybrid CNN-transformer architecture named CTCANet, combining the strengths of convolutional networks, transformer, and attention mechanisms, is proposed for high-resolution bi-temporal remote sensing image change detection. To obtain high-level feature representations that reveal changes of interest, CTCANet utilizes tokenizer to embed the features of each image extracted by convolutional network into a sequence of tokens, and the transformer module to model global spatio-temporal context in token space. The optimal bi-temporal information fusion approach is explored here. Subsequently, the reconstructed features carrying deep abstract information are fed to the cascaded decoder to aggregate with features containing shallow fine-grained information, through skip connections. Such an aggregation empowers our model to maintain the completeness of changes and accurately locate small targets. Moreover, the integration of the convolutional block attention module enables the smoothing of semantic gaps between heterogeneous features and the accentuation of relevant changes in both the channel and spatial domains, resulting in more impressive outcomes. The performance of the proposed CTCANet surpasses that of recent certain state-of-the-art methods, as evidenced by experimental results on two publicly accessible datasets, LEVIR-CD and SYSU-CD.
first_indexed 2024-03-11T04:08:34Z
format Article
id doaj.art-f0f1b5b735c64c4ab6281cd1336c38c2
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-11T04:08:34Z
publishDate 2023-05-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-f0f1b5b735c64c4ab6281cd1336c38c22023-11-17T23:39:45ZengMDPI AGRemote Sensing2072-42922023-05-01159240610.3390/rs15092406A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing ImagesMengmeng Yin0Zhibo Chen1Chengjian Zhang2School of Information Science and Technology, Beijing Forestry University, Beijing 100083, ChinaSchool of Information Science and Technology, Beijing Forestry University, Beijing 100083, ChinaSchool of Information Science and Technology, Beijing Forestry University, Beijing 100083, ChinaCurrent deep learning-based change detection approaches mostly produce convincing results by introducing attention mechanisms to traditional convolutional networks. However, given the limitation of the receptive field, convolution-based methods fall short of fully modelling global context and capturing long-range dependencies, thus insufficient in discriminating pseudo changes. Transformers have an efficient global spatio-temporal modelling capability, which is beneficial for the feature representation of changes of interest. However, the lack of detailed information may cause the transformer to locate the boundaries of changed regions inaccurately. Therefore, in this article, a hybrid CNN-transformer architecture named CTCANet, combining the strengths of convolutional networks, transformer, and attention mechanisms, is proposed for high-resolution bi-temporal remote sensing image change detection. To obtain high-level feature representations that reveal changes of interest, CTCANet utilizes tokenizer to embed the features of each image extracted by convolutional network into a sequence of tokens, and the transformer module to model global spatio-temporal context in token space. The optimal bi-temporal information fusion approach is explored here. Subsequently, the reconstructed features carrying deep abstract information are fed to the cascaded decoder to aggregate with features containing shallow fine-grained information, through skip connections. Such an aggregation empowers our model to maintain the completeness of changes and accurately locate small targets. Moreover, the integration of the convolutional block attention module enables the smoothing of semantic gaps between heterogeneous features and the accentuation of relevant changes in both the channel and spatial domains, resulting in more impressive outcomes. The performance of the proposed CTCANet surpasses that of recent certain state-of-the-art methods, as evidenced by experimental results on two publicly accessible datasets, LEVIR-CD and SYSU-CD.https://www.mdpi.com/2072-4292/15/9/2406change detectiontransformerconvolutional neural networks (CNN)convolutional block attention module (CBAM)attention mechanisms
spellingShingle Mengmeng Yin
Zhibo Chen
Chengjian Zhang
A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing Images
Remote Sensing
change detection
transformer
convolutional neural networks (CNN)
convolutional block attention module (CBAM)
attention mechanisms
title A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing Images
title_full A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing Images
title_fullStr A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing Images
title_full_unstemmed A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing Images
title_short A CNN-Transformer Network Combining CBAM for Change Detection in High-Resolution Remote Sensing Images
title_sort cnn transformer network combining cbam for change detection in high resolution remote sensing images
topic change detection
transformer
convolutional neural networks (CNN)
convolutional block attention module (CBAM)
attention mechanisms
url https://www.mdpi.com/2072-4292/15/9/2406
work_keys_str_mv AT mengmengyin acnntransformernetworkcombiningcbamforchangedetectioninhighresolutionremotesensingimages
AT zhibochen acnntransformernetworkcombiningcbamforchangedetectioninhighresolutionremotesensingimages
AT chengjianzhang acnntransformernetworkcombiningcbamforchangedetectioninhighresolutionremotesensingimages
AT mengmengyin cnntransformernetworkcombiningcbamforchangedetectioninhighresolutionremotesensingimages
AT zhibochen cnntransformernetworkcombiningcbamforchangedetectioninhighresolutionremotesensingimages
AT chengjianzhang cnntransformernetworkcombiningcbamforchangedetectioninhighresolutionremotesensingimages