Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification
Convolutional neural networks (CNNs) have been widely used in remote sensing scene classification. However, the long-range dependencies of local features cannot be taken into account by CNNs. By contrast, a visual transformer (ViT) is good at capturing the long-range dependencies as it considers the...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10186881/ |
_version_ | 1797776287492734976 |
---|---|
author | Siyuan Hao Nan Li Yuanxin Ye |
author_facet | Siyuan Hao Nan Li Yuanxin Ye |
author_sort | Siyuan Hao |
collection | DOAJ |
description | Convolutional neural networks (CNNs) have been widely used in remote sensing scene classification. However, the long-range dependencies of local features cannot be taken into account by CNNs. By contrast, a visual transformer (ViT) is good at capturing the long-range dependencies as it considers the global relationship of local features by introducing a self-attention mechanism. Although the ViT can obtain a good result when training on large-scale datasets, e.g., ImageNet, it is hard to be adapted to small-scale datasets (e.g., remote sensing image datasets). This is attributed to the fact that the ViT lacks the typical inductive bias capability. Therefore, we propose the inductive biased swin transformer with cyclic regressor used with random dense sampler (IBSwin-CR) to improve the training effect of the swin transformer on remote sensing image datasets, which builds upon three modules, i.e., inductive biased shifted window multihead self-attention (IBSW-MSA) module, random dense sampler, and a regressor with cyclic regression loss. We obtain the inductive bias information and the long-range dependencies of the attention map by the IBSW-MSA module. Moreover, the final feature map goes through a random dense sampler, in which the additional spatial information is learned. Finally, the network is normalized by a cross-entropy loss function and a cyclic regression loss function. The proposed IBSwin-CR model is evaluated on public datasets such as NWPU-RESISC45 dataset and Aerial Image Dataset, and the experimental results show that the proposed network can achieve better performance than other classification models, especially for the case with a small number of samples. |
first_indexed | 2024-03-12T22:47:49Z |
format | Article |
id | doaj.art-0e379ccaba11458e84eb93334adbef0b |
institution | Directory Open Access Journal |
issn | 2151-1535 |
language | English |
last_indexed | 2024-03-12T22:47:49Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
spelling | doaj.art-0e379ccaba11458e84eb93334adbef0b2023-07-20T23:00:18ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing2151-15352023-01-01166265627810.1109/JSTARS.2023.329067610186881Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene ClassificationSiyuan Hao0https://orcid.org/0000-0001-8247-4207Nan Li1https://orcid.org/0009-0005-1952-2209Yuanxin Ye2https://orcid.org/0000-0001-6843-6722College of Information and Control Engineering, Qingdao University of Technology, Qingdao, ChinaCollege of Information and Control Engineering, Qingdao University of Technology, Qingdao, ChinaFaculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu, ChinaConvolutional neural networks (CNNs) have been widely used in remote sensing scene classification. However, the long-range dependencies of local features cannot be taken into account by CNNs. By contrast, a visual transformer (ViT) is good at capturing the long-range dependencies as it considers the global relationship of local features by introducing a self-attention mechanism. Although the ViT can obtain a good result when training on large-scale datasets, e.g., ImageNet, it is hard to be adapted to small-scale datasets (e.g., remote sensing image datasets). This is attributed to the fact that the ViT lacks the typical inductive bias capability. Therefore, we propose the inductive biased swin transformer with cyclic regressor used with random dense sampler (IBSwin-CR) to improve the training effect of the swin transformer on remote sensing image datasets, which builds upon three modules, i.e., inductive biased shifted window multihead self-attention (IBSW-MSA) module, random dense sampler, and a regressor with cyclic regression loss. We obtain the inductive bias information and the long-range dependencies of the attention map by the IBSW-MSA module. Moreover, the final feature map goes through a random dense sampler, in which the additional spatial information is learned. Finally, the network is normalized by a cross-entropy loss function and a cyclic regression loss function. The proposed IBSwin-CR model is evaluated on public datasets such as NWPU-RESISC45 dataset and Aerial Image Dataset, and the experimental results show that the proposed network can achieve better performance than other classification models, especially for the case with a small number of samples.https://ieeexplore.ieee.org/document/10186881/Loss functionremote sensing imagescene classificationself-supervised learning (SSL)swin transformer |
spellingShingle | Siyuan Hao Nan Li Yuanxin Ye Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Loss function remote sensing image scene classification self-supervised learning (SSL) swin transformer |
title | Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification |
title_full | Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification |
title_fullStr | Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification |
title_full_unstemmed | Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification |
title_short | Inductive Biased Swin-Transformer With Cyclic Regressor for Remote Sensing Scene Classification |
title_sort | inductive biased swin transformer with cyclic regressor for remote sensing scene classification |
topic | Loss function remote sensing image scene classification self-supervised learning (SSL) swin transformer |
url | https://ieeexplore.ieee.org/document/10186881/ |
work_keys_str_mv | AT siyuanhao inductivebiasedswintransformerwithcyclicregressorforremotesensingsceneclassification AT nanli inductivebiasedswintransformerwithcyclicregressorforremotesensingsceneclassification AT yuanxinye inductivebiasedswintransformerwithcyclicregressorforremotesensingsceneclassification |