Multi-Attention Multi-Image Super-Resolution Transformer (MAST) for Remote Sensing

Deep-learning-driven multi-image super-resolution (MISR) reconstruction techniques have significant application value in the field of aerospace remote sensing. In particular, Transformer-based models have shown outstanding performance in super-resolution tasks. However, current MISR models have some...

全面介紹

書目詳細資料
Main Authors: Jiaao Li, Qunbo Lv, Wenjian Zhang, Baoyu Zhu, Guiyu Zhang, Zheng Tan
格式: Article
語言:English
出版: MDPI AG 2023-08-01
叢編:Remote Sensing
主題:
在線閱讀:https://www.mdpi.com/2072-4292/15/17/4183
實物特徵
總結:Deep-learning-driven multi-image super-resolution (MISR) reconstruction techniques have significant application value in the field of aerospace remote sensing. In particular, Transformer-based models have shown outstanding performance in super-resolution tasks. However, current MISR models have some deficiencies in the application of multi-scale information and the modeling of the attention mechanism, leading to an insufficient utilization of complementary information in multiple images. In this context, we innovatively propose a Multi-Attention Multi-Image Super-Resolution Transformer (MAST), which involves improvements in two main aspects. Firstly, we present a Multi-Scale and Mixed Attention Block (MMAB). With its multi-scale structure, the network is able to extract image features from different scales to obtain more contextual information. Additionally, the introduction of mixed attention allows the network to fully explore high-frequency features of the images in both channel and spatial dimensions. Secondly, we propose a Collaborative Attention Fusion Block (CAFB). By incorporating channel attention into the self-attention layer of the Transformer, we aim to better establish global correlations between multiple images. To improve the network’s perception ability of local detailed features, we introduce a Residual Local Attention Block (RLAB). With the aforementioned improvements, our model can better extract and utilize non-redundant information, achieving a superior restoration effect that balances the global structure and local details of the image. The results from the comparative experiments reveal that our approach demonstrated a notable enhancement in cPSNR, with improvements of 0.91 dB and 0.81 dB observed in the NIR and RED bands of the PROBA-V dataset, respectively, in comparison to the existing state-of-the-art methods. Extensive experiments demonstrate that the method proposed in this paper can provide a valuable reference for solving multi-image super-resolution tasks for remote sensing.
ISSN:2072-4292