Remote Sensing Object Detection Based on Convolution and Swin Transformer

Remote sensing object detection is an essential task for surveying the earth. It is challenging for the target detection algorithm in natural scenes to obtain satisfactory detection results in remote sensing images. In this paper, the RAST-YOLO (You only look once with Regin Attention and Swin Trans...

Full description

Bibliographic Details
Main Authors:	Xuzhao Jiang, Yonghong Wu
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Remote sensing images object detection attention mechanism swin transformer multi-scale features
Online Access:	https://ieeexplore.ieee.org/document/10103543/

_version_	1797840116115308544
author	Xuzhao Jiang Yonghong Wu
author_facet	Xuzhao Jiang Yonghong Wu
author_sort	Xuzhao Jiang
collection	DOAJ
description	Remote sensing object detection is an essential task for surveying the earth. It is challenging for the target detection algorithm in natural scenes to obtain satisfactory detection results in remote sensing images. In this paper, the RAST-YOLO (You only look once with Regin Attention and Swin Transformer) algorithm is proposed to address the problems of remote sensing object detection, such as significant differences in target scales, complex backgrounds, and tightly arranged small-size targets. To increase the information interaction range of the feature map, make full use of the background information of the object, and improve the detection accuracy of the object with a complex background, the Regin Attention (RA) mechanism combined with Swin Transformer as the backbone is proposed to extract features. To improve the detection accuracy of small objects, the C3D module is used to fuse deep and shallow semantic information and optimize the multi-scale problem of remote sensing targets. To evaluate the performance of RAST-YOLO, extensive experiments are performed on DIOR and TGRS-HRRSD datasets. The experimental results show that RAST achieves state-of-the-art detection accuracy with high efficiency and robustness. Specifically, compared with the baseline network, the mean average precision (mAP) of detection results is improved by 5% and 2.3% on DIOR and TGRS-HRRSD datasets, respectively, which demonstrates RAST-YOLO is effective and superior. Moreover, the lightweight structure of RAST-YOLO can ensure the real-time detection speed and obtain excellent detection results.
first_indexed	2024-04-09T16:09:21Z
format	Article
id	doaj.art-ffee313b737b4df9ae64553ce83d5d11
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-09T16:09:21Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-ffee313b737b4df9ae64553ce83d5d112023-04-24T23:00:35ZengIEEEIEEE Access2169-35362023-01-0111386433865610.1109/ACCESS.2023.326743510103543Remote Sensing Object Detection Based on Convolution and Swin TransformerXuzhao Jiang0https://orcid.org/0000-0002-9264-6989Yonghong Wu1Department of Statistics, Wuhan University of Technology, Wuhan, ChinaDepartment of Statistics, Wuhan University of Technology, Wuhan, ChinaRemote sensing object detection is an essential task for surveying the earth. It is challenging for the target detection algorithm in natural scenes to obtain satisfactory detection results in remote sensing images. In this paper, the RAST-YOLO (You only look once with Regin Attention and Swin Transformer) algorithm is proposed to address the problems of remote sensing object detection, such as significant differences in target scales, complex backgrounds, and tightly arranged small-size targets. To increase the information interaction range of the feature map, make full use of the background information of the object, and improve the detection accuracy of the object with a complex background, the Regin Attention (RA) mechanism combined with Swin Transformer as the backbone is proposed to extract features. To improve the detection accuracy of small objects, the C3D module is used to fuse deep and shallow semantic information and optimize the multi-scale problem of remote sensing targets. To evaluate the performance of RAST-YOLO, extensive experiments are performed on DIOR and TGRS-HRRSD datasets. The experimental results show that RAST achieves state-of-the-art detection accuracy with high efficiency and robustness. Specifically, compared with the baseline network, the mean average precision (mAP) of detection results is improved by 5% and 2.3% on DIOR and TGRS-HRRSD datasets, respectively, which demonstrates RAST-YOLO is effective and superior. Moreover, the lightweight structure of RAST-YOLO can ensure the real-time detection speed and obtain excellent detection results.https://ieeexplore.ieee.org/document/10103543/Remote sensing imagesobject detectionattention mechanismswin transformermulti-scale features
spellingShingle	Xuzhao Jiang Yonghong Wu Remote Sensing Object Detection Based on Convolution and Swin Transformer IEEE Access Remote sensing images object detection attention mechanism swin transformer multi-scale features
title	Remote Sensing Object Detection Based on Convolution and Swin Transformer
title_full	Remote Sensing Object Detection Based on Convolution and Swin Transformer
title_fullStr	Remote Sensing Object Detection Based on Convolution and Swin Transformer
title_full_unstemmed	Remote Sensing Object Detection Based on Convolution and Swin Transformer
title_short	Remote Sensing Object Detection Based on Convolution and Swin Transformer
title_sort	remote sensing object detection based on convolution and swin transformer
topic	Remote sensing images object detection attention mechanism swin transformer multi-scale features
url	https://ieeexplore.ieee.org/document/10103543/
work_keys_str_mv	AT xuzhaojiang remotesensingobjectdetectionbasedonconvolutionandswintransformer AT yonghongwu remotesensingobjectdetectionbasedonconvolutionandswintransformer

Remote Sensing Object Detection Based on Convolution and Swin Transformer

Similar Items