End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification

Although convolutional neural networks (CNNs) have proven successful for hyperspectral image classification (HSIC), it is difficult to characterize the global dependencies between HSI pixels at long-distance ranges and spectral bands due to their limited receptive domain. The transformer can compens...

Full description

Bibliographic Details
Main Authors:	Shiping Li, Lianhui Liang, Shaoquan Zhang, Ying Zhang, Antonio Plaza, Xuehua Wang
Format:	Article
Language:	English
Published:	MDPI AG 2024-01-01
Series:	Remote Sensing
Subjects:	convolutional neural networks (CNNs) hyperspectral image classification (HSIC) spectral–spatial transformer multi-head self-attention (MHSA)
Online Access:	https://www.mdpi.com/2072-4292/16/2/325

_version_	1827371166300372992
author	Shiping Li Lianhui Liang Shaoquan Zhang Ying Zhang Antonio Plaza Xuehua Wang
author_facet	Shiping Li Lianhui Liang Shaoquan Zhang Ying Zhang Antonio Plaza Xuehua Wang
author_sort	Shiping Li
collection	DOAJ
description	Although convolutional neural networks (CNNs) have proven successful for hyperspectral image classification (HSIC), it is difficult to characterize the global dependencies between HSI pixels at long-distance ranges and spectral bands due to their limited receptive domain. The transformer can compensate well for this shortcoming, but it suffers from a lack of image-specific inductive biases (i.e., localization and translation equivariance) and contextual position information compared with CNNs. To overcome the aforementioned challenges, we introduce a simply structured, end-to-end convolutional network and spectral–spatial transformer (CNSST) architecture for HSIC. Our CNSST architecture consists of two essential components: a simple 3D-CNN-based hierarchical feature fusion network and a spectral–spatial transformer that introduces inductive bias information. The former employs a 3D-CNN-based hierarchical feature fusion structure to establish the correlation between spectral and spatial (SAS) information while capturing richer inductive bias and more discriminative local spectral-spatial hierarchical feature information, while the latter aims to establish the global dependency among HSI pixels while enhancing the acquisition of local information by introducing inductive bias information. Specifically, the spectral and inductive bias information is incorporated into the transformer’s multi-head self-attention mechanism (MHSA), thus making the attention spectrally aware and location-aware. Furthermore, a Lion optimizer is exploited to boost the classification performance of our newly developed CNSST. Substantial experiments conducted on three publicly accessible hyperspectral datasets unequivocally showcase that our proposed CNSST outperforms other state-of-the-art approaches.
first_indexed	2024-03-08T10:35:18Z
format	Article
id	doaj.art-46cff689b10844d4ba866e12f6321236
institution	Directory Open Access Journal
issn	2072-4292
language	English
last_indexed	2024-03-08T10:35:18Z
publishDate	2024-01-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj.art-46cff689b10844d4ba866e12f63212362024-01-26T18:18:16ZengMDPI AGRemote Sensing2072-42922024-01-0116232510.3390/rs16020325End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image ClassificationShiping Li0Lianhui Liang1Shaoquan Zhang2Ying Zhang3Antonio Plaza4Xuehua Wang5School of Materials Science Engineering, Wuhan Institute of Technology, Wuhan 430079, ChinaCollege of Electrical and Information Engineering, Hunan University, Changsha 410082, ChinaSchool of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, ChinaCollege of Electrical and Information Engineering, Hunan University, Changsha 410082, ChinaHyperspectral Computing Laboratory, Department of Technology of Computers and Communications, Escuela Politécnica, University of Extremadura, E-10071 Cáceres, SpainSchool of Materials Science Engineering, Wuhan Institute of Technology, Wuhan 430079, ChinaAlthough convolutional neural networks (CNNs) have proven successful for hyperspectral image classification (HSIC), it is difficult to characterize the global dependencies between HSI pixels at long-distance ranges and spectral bands due to their limited receptive domain. The transformer can compensate well for this shortcoming, but it suffers from a lack of image-specific inductive biases (i.e., localization and translation equivariance) and contextual position information compared with CNNs. To overcome the aforementioned challenges, we introduce a simply structured, end-to-end convolutional network and spectral–spatial transformer (CNSST) architecture for HSIC. Our CNSST architecture consists of two essential components: a simple 3D-CNN-based hierarchical feature fusion network and a spectral–spatial transformer that introduces inductive bias information. The former employs a 3D-CNN-based hierarchical feature fusion structure to establish the correlation between spectral and spatial (SAS) information while capturing richer inductive bias and more discriminative local spectral-spatial hierarchical feature information, while the latter aims to establish the global dependency among HSI pixels while enhancing the acquisition of local information by introducing inductive bias information. Specifically, the spectral and inductive bias information is incorporated into the transformer’s multi-head self-attention mechanism (MHSA), thus making the attention spectrally aware and location-aware. Furthermore, a Lion optimizer is exploited to boost the classification performance of our newly developed CNSST. Substantial experiments conducted on three publicly accessible hyperspectral datasets unequivocally showcase that our proposed CNSST outperforms other state-of-the-art approaches.https://www.mdpi.com/2072-4292/16/2/325convolutional neural networks (CNNs)hyperspectral image classification (HSIC)spectral–spatial transformermulti-head self-attention (MHSA)
spellingShingle	Shiping Li Lianhui Liang Shaoquan Zhang Ying Zhang Antonio Plaza Xuehua Wang End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification Remote Sensing convolutional neural networks (CNNs) hyperspectral image classification (HSIC) spectral–spatial transformer multi-head self-attention (MHSA)
title	End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification
title_full	End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification
title_fullStr	End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification
title_full_unstemmed	End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification
title_short	End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification
title_sort	end to end convolutional network and spectral spatial transformer architecture for hyperspectral image classification
topic	convolutional neural networks (CNNs) hyperspectral image classification (HSIC) spectral–spatial transformer multi-head self-attention (MHSA)
url	https://www.mdpi.com/2072-4292/16/2/325
work_keys_str_mv	AT shipingli endtoendconvolutionalnetworkandspectralspatialtransformerarchitectureforhyperspectralimageclassification AT lianhuiliang endtoendconvolutionalnetworkandspectralspatialtransformerarchitectureforhyperspectralimageclassification AT shaoquanzhang endtoendconvolutionalnetworkandspectralspatialtransformerarchitectureforhyperspectralimageclassification AT yingzhang endtoendconvolutionalnetworkandspectralspatialtransformerarchitectureforhyperspectralimageclassification AT antonioplaza endtoendconvolutionalnetworkandspectralspatialtransformerarchitectureforhyperspectralimageclassification AT xuehuawang endtoendconvolutionalnetworkandspectralspatialtransformerarchitectureforhyperspectralimageclassification

End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification

Similar Items