A Spectral–Spatial Transformer Fusion Method for Hyperspectral Video Tracking

Hyperspectral videos (HSVs) can record more adequate detail clues than other videos, which is especially beneficial in cases of abundant spectral information. Although traditional methods based on correlation filters (CFs) employed to explore spectral information locally achieve promising results, t...

Full description

Bibliographic Details
Main Authors: Ye Wang, Yuheng Liu, Mingyang Ma, Shaohui Mei
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/15/7/1735
_version_ 1797607095406690304
author Ye Wang
Yuheng Liu
Mingyang Ma
Shaohui Mei
author_facet Ye Wang
Yuheng Liu
Mingyang Ma
Shaohui Mei
author_sort Ye Wang
collection DOAJ
description Hyperspectral videos (HSVs) can record more adequate detail clues than other videos, which is especially beneficial in cases of abundant spectral information. Although traditional methods based on correlation filters (CFs) employed to explore spectral information locally achieve promising results, their performances are limited by ignoring global information. In this paper, a joint spectral–spatial information method, named spectral–spatial transformer-based feature fusion tracker (SSTFT), is proposed for hyperspectral video tracking, which is capable of utilizing spectral–spatial features and considering global interactions. Specifically, the feature extraction module employs two parallel branches to extract multiple-level coarse-grained and fine-grained spectral–spatial features, which are fused with adaptive weights. The extracted features are further fused with the context fusion module based on a transformer with the hyperspectral self-attention (HSA) and hyperspectral cross-attention (HCA), which are designed to capture the self-context feature interaction and the cross-context feature interaction, respectively. Furthermore, an adaptive dynamic template updating strategy is used to update the template bounding box based on the prediction score. The extensive experimental results on benchmark hyperspectral video tracking datasets demonstrated that the proposed SSTFT outperforms the state-of-the-art methods in both precision and speed.
first_indexed 2024-03-11T05:26:26Z
format Article
id doaj.art-149b68b3674947a2a9c27a16e7597a25
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-11T05:26:26Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-149b68b3674947a2a9c27a16e7597a252023-11-17T17:28:09ZengMDPI AGRemote Sensing2072-42922023-03-01157173510.3390/rs15071735A Spectral–Spatial Transformer Fusion Method for Hyperspectral Video TrackingYe Wang0Yuheng Liu1Mingyang Ma2Shaohui Mei3School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, ChinaSchool of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, ChinaSchool of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, ChinaSchool of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, ChinaHyperspectral videos (HSVs) can record more adequate detail clues than other videos, which is especially beneficial in cases of abundant spectral information. Although traditional methods based on correlation filters (CFs) employed to explore spectral information locally achieve promising results, their performances are limited by ignoring global information. In this paper, a joint spectral–spatial information method, named spectral–spatial transformer-based feature fusion tracker (SSTFT), is proposed for hyperspectral video tracking, which is capable of utilizing spectral–spatial features and considering global interactions. Specifically, the feature extraction module employs two parallel branches to extract multiple-level coarse-grained and fine-grained spectral–spatial features, which are fused with adaptive weights. The extracted features are further fused with the context fusion module based on a transformer with the hyperspectral self-attention (HSA) and hyperspectral cross-attention (HCA), which are designed to capture the self-context feature interaction and the cross-context feature interaction, respectively. Furthermore, an adaptive dynamic template updating strategy is used to update the template bounding box based on the prediction score. The extensive experimental results on benchmark hyperspectral video tracking datasets demonstrated that the proposed SSTFT outperforms the state-of-the-art methods in both precision and speed.https://www.mdpi.com/2072-4292/15/7/1735transformer fusionspectral–spatial jointhyperspectral object tracking
spellingShingle Ye Wang
Yuheng Liu
Mingyang Ma
Shaohui Mei
A Spectral–Spatial Transformer Fusion Method for Hyperspectral Video Tracking
Remote Sensing
transformer fusion
spectral–spatial joint
hyperspectral object tracking
title A Spectral–Spatial Transformer Fusion Method for Hyperspectral Video Tracking
title_full A Spectral–Spatial Transformer Fusion Method for Hyperspectral Video Tracking
title_fullStr A Spectral–Spatial Transformer Fusion Method for Hyperspectral Video Tracking
title_full_unstemmed A Spectral–Spatial Transformer Fusion Method for Hyperspectral Video Tracking
title_short A Spectral–Spatial Transformer Fusion Method for Hyperspectral Video Tracking
title_sort spectral spatial transformer fusion method for hyperspectral video tracking
topic transformer fusion
spectral–spatial joint
hyperspectral object tracking
url https://www.mdpi.com/2072-4292/15/7/1735
work_keys_str_mv AT yewang aspectralspatialtransformerfusionmethodforhyperspectralvideotracking
AT yuhengliu aspectralspatialtransformerfusionmethodforhyperspectralvideotracking
AT mingyangma aspectralspatialtransformerfusionmethodforhyperspectralvideotracking
AT shaohuimei aspectralspatialtransformerfusionmethodforhyperspectralvideotracking
AT yewang spectralspatialtransformerfusionmethodforhyperspectralvideotracking
AT yuhengliu spectralspatialtransformerfusionmethodforhyperspectralvideotracking
AT mingyangma spectralspatialtransformerfusionmethodforhyperspectralvideotracking
AT shaohuimei spectralspatialtransformerfusionmethodforhyperspectralvideotracking