Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification

Convolutional neural networks (CNNs) have been widely used in remote sensing scene classification. However, the long-range dependencies of local features cannot be taken into account by CNNs. By contrast, a visual transformer (ViT) is good at capturing the long-range dependencies as it considers the...

Full description

Bibliographic Details
Main Authors: Siyuan Hao, Nan Li, Yuanxin Ye
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10186881/
_version_ 1797776287492734976
author Siyuan Hao
Nan Li
Yuanxin Ye
author_facet Siyuan Hao
Nan Li
Yuanxin Ye
author_sort Siyuan Hao
collection DOAJ
description Convolutional neural networks (CNNs) have been widely used in remote sensing scene classification. However, the long-range dependencies of local features cannot be taken into account by CNNs. By contrast, a visual transformer (ViT) is good at capturing the long-range dependencies as it considers the global relationship of local features by introducing a self-attention mechanism. Although the ViT can obtain a good result when training on large-scale datasets, e.g., ImageNet, it is hard to be adapted to small-scale datasets (e.g., remote sensing image datasets). This is attributed to the fact that the ViT lacks the typical inductive bias capability. Therefore, we propose the inductive biased swin transformer with cyclic regressor used with random dense sampler (IBSwin-CR) to improve the training effect of the swin transformer on remote sensing image datasets, which builds upon three modules, i.e., inductive biased shifted window multihead self-attention (IBSW-MSA) module, random dense sampler, and a regressor with cyclic regression loss. We obtain the inductive bias information and the long-range dependencies of the attention map by the IBSW-MSA module. Moreover, the final feature map goes through a random dense sampler, in which the additional spatial information is learned. Finally, the network is normalized by a cross-entropy loss function and a cyclic regression loss function. The proposed IBSwin-CR model is evaluated on public datasets such as NWPU-RESISC45 dataset and Aerial Image Dataset, and the experimental results show that the proposed network can achieve better performance than other classification models, especially for the case with a small number of samples.
first_indexed 2024-03-12T22:47:49Z
format Article
id doaj.art-0e379ccaba11458e84eb93334adbef0b
institution Directory Open Access Journal
issn 2151-1535
language English
last_indexed 2024-03-12T22:47:49Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj.art-0e379ccaba11458e84eb93334adbef0b2023-07-20T23:00:18ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing2151-15352023-01-01166265627810.1109/JSTARS.2023.329067610186881Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene ClassificationSiyuan Hao0https://orcid.org/0000-0001-8247-4207Nan Li1https://orcid.org/0009-0005-1952-2209Yuanxin Ye2https://orcid.org/0000-0001-6843-6722College of Information and Control Engineering, Qingdao University of Technology, Qingdao, ChinaCollege of Information and Control Engineering, Qingdao University of Technology, Qingdao, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu, ChinaConvolutional neural networks (CNNs) have been widely used in remote sensing scene classification. However, the long-range dependencies of local features cannot be taken into account by CNNs. By contrast, a visual transformer (ViT) is good at capturing the long-range dependencies as it considers the global relationship of local features by introducing a self-attention mechanism. Although the ViT can obtain a good result when training on large-scale datasets, e.g., ImageNet, it is hard to be adapted to small-scale datasets (e.g., remote sensing image datasets). This is attributed to the fact that the ViT lacks the typical inductive bias capability. Therefore, we propose the inductive biased swin transformer with cyclic regressor used with random dense sampler (IBSwin-CR) to improve the training effect of the swin transformer on remote sensing image datasets, which builds upon three modules, i.e., inductive biased shifted window multihead self-attention (IBSW-MSA) module, random dense sampler, and a regressor with cyclic regression loss. We obtain the inductive bias information and the long-range dependencies of the attention map by the IBSW-MSA module. Moreover, the final feature map goes through a random dense sampler, in which the additional spatial information is learned. Finally, the network is normalized by a cross-entropy loss function and a cyclic regression loss function. The proposed IBSwin-CR model is evaluated on public datasets such as NWPU-RESISC45 dataset and Aerial Image Dataset, and the experimental results show that the proposed network can achieve better performance than other classification models, especially for the case with a small number of samples.https://ieeexplore.ieee.org/document/10186881/Loss functionremote sensing imagescene classificationself-supervised learning (SSL)swin transformer
spellingShingle Siyuan Hao
Nan Li
Yuanxin Ye
Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Loss function
remote sensing image
scene classification
self-supervised learning (SSL)
swin transformer
title Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification
title_full Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification
title_fullStr Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification
title_full_unstemmed Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification
title_short Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification
title_sort inductive biased swin transformer with cyclic regressor for remote sensing scene classification
topic Loss function
remote sensing image
scene classification
self-supervised learning (SSL)
swin transformer
url https://ieeexplore.ieee.org/document/10186881/
work_keys_str_mv AT siyuanhao inductivebiasedswintransformerwithcyclicregressorforremotesensingsceneclassification
AT nanli inductivebiasedswintransformerwithcyclicregressorforremotesensingsceneclassification
AT yuanxinye inductivebiasedswintransformerwithcyclicregressorforremotesensingsceneclassification