Classification of Hyperspectral and LiDAR Data Using Multi-Modal Transformer Cascaded Fusion Net

With the continuous development of surface observation methods and technologies, we can acquire multiple sources of data more effectively in the same geographic area. The quality and availability of these data have also significantly improved. Consequently, how to better utilize multi-source data to...

Szczegółowa specyfikacja

Opis bibliograficzny
Główni autorzy:	Shuo Wang, Chengchao Hou, Yiming Chen, Zhengjun Liu, Zhenbei Zhang, Geng Zhang
Format:	Artykuł
Język:	English
Wydane:	MDPI AG 2023-08-01
Seria:	Remote Sensing
Hasła przedmiotowe:	deep learning multi-head self-attention (MHSA) multi-modal transformer cascaded fusion net (MMTCFN) HSI-LiDAR classification
Dostęp online:	https://www.mdpi.com/2072-4292/15/17/4142

_version_	1827727879651196928
author	Shuo Wang Chengchao Hou Yiming Chen Zhengjun Liu Zhenbei Zhang Geng Zhang
author_facet	Shuo Wang Chengchao Hou Yiming Chen Zhengjun Liu Zhenbei Zhang Geng Zhang
author_sort	Shuo Wang
collection	DOAJ
description	With the continuous development of surface observation methods and technologies, we can acquire multiple sources of data more effectively in the same geographic area. The quality and availability of these data have also significantly improved. Consequently, how to better utilize multi-source data to represent ground information has become an important research question in the field of geoscience. In this paper, a novel model called multi-modal transformer cascaded fusion net (MMTCFN) is proposed for fusion and classification of multi-modal remote sensing data, Hyperspectral Imagery (HSI) and LiDAR data. Feature fusion and feature extraction are the two stages of the model. First, in the feature extraction stage, a three-branch cascaded Convolutional Neural Network (CNN) framework is employed to fully leverage the advantages of convolutional operators in extracting shallow-level local features. Based on this, we generated multi-modal long-range integrated deep features utilizing the transformer-based vectorized pixel group transformer (VPGT) module during the feature fusion stage. In the VPGT block, we designed a vectorized pixel group embedding that preserves the global features extracted from the three branches in a non-overlapping multi-space manner. Moreover, we introduce the DropKey mechanism into the multi-head self-attention (MHSA) to alleviate overfitting caused by insufficient training samples. Finally, we employ a probabilistic decision fusion strategy to integrate multiple class estimations, assigning a specific category to each pixel. This model was experimented on three HSI-LiDAR datasets with balanced and unbalanced training samples. The proposed model outperforms the other seven SOTA approaches in terms of OA performance, proving the superiority of MMTCFN for the HSI-LiDAR classification task.
first_indexed	2024-03-10T23:14:12Z
format	Article
id	doaj.art-42ebc197989243b89b6bd5fd7421ce09
institution	Directory Open Access Journal
issn	2072-4292
language	English
last_indexed	2024-03-10T23:14:12Z
publishDate	2023-08-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj.art-42ebc197989243b89b6bd5fd7421ce092023-11-19T08:45:02ZengMDPI AGRemote Sensing2072-42922023-08-011517414210.3390/rs15174142Classification of Hyperspectral and LiDAR Data Using Multi-Modal Transformer Cascaded Fusion NetShuo Wang0Chengchao Hou1Yiming Chen2Zhengjun Liu3Zhenbei Zhang4Geng Zhang5Chinese Academy of Surveying & Mapping, Beijing 100036, ChinaChinese Academy of Surveying & Mapping, Beijing 100036, ChinaChinese Academy of Surveying & Mapping, Beijing 100036, ChinaChinese Academy of Surveying & Mapping, Beijing 100036, ChinaState Key Laboratory of Tibetan Plateau Earth System, Resources and Environment (TPESRE), Institute of Tibetan Plateau Research, Chinese Academy of Sciences, Beijing 100101, ChinaChinese Academy of Surveying & Mapping, Beijing 100036, ChinaWith the continuous development of surface observation methods and technologies, we can acquire multiple sources of data more effectively in the same geographic area. The quality and availability of these data have also significantly improved. Consequently, how to better utilize multi-source data to represent ground information has become an important research question in the field of geoscience. In this paper, a novel model called multi-modal transformer cascaded fusion net (MMTCFN) is proposed for fusion and classification of multi-modal remote sensing data, Hyperspectral Imagery (HSI) and LiDAR data. Feature fusion and feature extraction are the two stages of the model. First, in the feature extraction stage, a three-branch cascaded Convolutional Neural Network (CNN) framework is employed to fully leverage the advantages of convolutional operators in extracting shallow-level local features. Based on this, we generated multi-modal long-range integrated deep features utilizing the transformer-based vectorized pixel group transformer (VPGT) module during the feature fusion stage. In the VPGT block, we designed a vectorized pixel group embedding that preserves the global features extracted from the three branches in a non-overlapping multi-space manner. Moreover, we introduce the DropKey mechanism into the multi-head self-attention (MHSA) to alleviate overfitting caused by insufficient training samples. Finally, we employ a probabilistic decision fusion strategy to integrate multiple class estimations, assigning a specific category to each pixel. This model was experimented on three HSI-LiDAR datasets with balanced and unbalanced training samples. The proposed model outperforms the other seven SOTA approaches in terms of OA performance, proving the superiority of MMTCFN for the HSI-LiDAR classification task.https://www.mdpi.com/2072-4292/15/17/4142deep learningmulti-head self-attention (MHSA)multi-modal transformer cascaded fusion net (MMTCFN)HSI-LiDAR classification
spellingShingle	Shuo Wang Chengchao Hou Yiming Chen Zhengjun Liu Zhenbei Zhang Geng Zhang Classification of Hyperspectral and LiDAR Data Using Multi-Modal Transformer Cascaded Fusion Net Remote Sensing deep learning multi-head self-attention (MHSA) multi-modal transformer cascaded fusion net (MMTCFN) HSI-LiDAR classification
title	Classification of Hyperspectral and LiDAR Data Using Multi-Modal Transformer Cascaded Fusion Net
title_full	Classification of Hyperspectral and LiDAR Data Using Multi-Modal Transformer Cascaded Fusion Net
title_fullStr	Classification of Hyperspectral and LiDAR Data Using Multi-Modal Transformer Cascaded Fusion Net
title_full_unstemmed	Classification of Hyperspectral and LiDAR Data Using Multi-Modal Transformer Cascaded Fusion Net
title_short	Classification of Hyperspectral and LiDAR Data Using Multi-Modal Transformer Cascaded Fusion Net
title_sort	classification of hyperspectral and lidar data using multi modal transformer cascaded fusion net
topic	deep learning multi-head self-attention (MHSA) multi-modal transformer cascaded fusion net (MMTCFN) HSI-LiDAR classification
url	https://www.mdpi.com/2072-4292/15/17/4142
work_keys_str_mv	AT shuowang classificationofhyperspectralandlidardatausingmultimodaltransformercascadedfusionnet AT chengchaohou classificationofhyperspectralandlidardatausingmultimodaltransformercascadedfusionnet AT yimingchen classificationofhyperspectralandlidardatausingmultimodaltransformercascadedfusionnet AT zhengjunliu classificationofhyperspectralandlidardatausingmultimodaltransformercascadedfusionnet AT zhenbeizhang classificationofhyperspectralandlidardatausingmultimodaltransformercascadedfusionnet AT gengzhang classificationofhyperspectralandlidardatausingmultimodaltransformercascadedfusionnet

Classification of Hyperspectral and LiDAR Data Using Multi-Modal Transformer Cascaded Fusion Net

Podobne zapisy