YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery

The deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequently fail to produce satisfactory results when appli...

Full description

Bibliographic Details
Main Authors:	Yiheng Wu, Jianjun Li
Format:	Article
Language:	English
Published:	MDPI AG 2023-02-01
Series:	Sensors
Subjects:	aerial imagery ultra-high spatial resolution orbital imagery object detection YOLOv4 vision transformer deep learning
Online Access:	https://www.mdpi.com/1424-8220/23/5/2522

_version_	1827752105034645504
author	Yiheng Wu Jianjun Li
author_facet	Yiheng Wu Jianjun Li
author_sort	Yiheng Wu
collection	DOAJ
description	The deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequently fail to produce satisfactory results when applied to aerial images. To address these problems, we proposed the DET-YOLO enhancement based on YOLOv4. Initially, we employed a vision transformer to acquire highly effective global information extraction capabilities. In the transformer, we proposed deformable embedding instead of linear embedding and a full convolution feedforward network (FCFN) instead of a feedforward network in order to reduce the feature loss caused by cutting in the embedding process and improve the spatial feature extraction capability. Second, for improved multiscale feature fusion in the neck, we employed a depth direction separable deformable pyramid module (DSDP) rather than a feature pyramid network. Experiments on the DOTA, RSOD, and UCAS-AOD datasets demonstrated that our method’s average accuracy (mAP) values reached 0.728, 0.952, and 0.945, respectively, which were comparable to the existing state-of-the-art methods.
first_indexed	2024-03-11T07:10:05Z
format	Article
id	doaj.art-060b29a61eba458780deb97b2c7413c3
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-11T07:10:05Z
publishDate	2023-02-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-060b29a61eba458780deb97b2c7413c32023-11-17T08:36:00ZengMDPI AGSensors1424-82202023-02-01235252210.3390/s23052522YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial ImageryYiheng Wu0Jianjun Li1College of Computer and Information Engineering, Central South University of Forestry and Technology University, Changsha 410004, ChinaCollege of Computer and Information Engineering, Central South University of Forestry and Technology University, Changsha 410004, ChinaThe deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequently fail to produce satisfactory results when applied to aerial images. To address these problems, we proposed the DET-YOLO enhancement based on YOLOv4. Initially, we employed a vision transformer to acquire highly effective global information extraction capabilities. In the transformer, we proposed deformable embedding instead of linear embedding and a full convolution feedforward network (FCFN) instead of a feedforward network in order to reduce the feature loss caused by cutting in the embedding process and improve the spatial feature extraction capability. Second, for improved multiscale feature fusion in the neck, we employed a depth direction separable deformable pyramid module (DSDP) rather than a feature pyramid network. Experiments on the DOTA, RSOD, and UCAS-AOD datasets demonstrated that our method’s average accuracy (mAP) values reached 0.728, 0.952, and 0.945, respectively, which were comparable to the existing state-of-the-art methods.https://www.mdpi.com/1424-8220/23/5/2522aerial imageryultra-high spatial resolution orbital imageryobject detectionYOLOv4vision transformerdeep learning
spellingShingle	Yiheng Wu Jianjun Li YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery Sensors aerial imagery ultra-high spatial resolution orbital imagery object detection YOLOv4 vision transformer deep learning
title	YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
title_full	YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
title_fullStr	YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
title_full_unstemmed	YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
title_short	YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
title_sort	yolov4 with deformable embedding transformer feature extractor for exact object detection in aerial imagery
topic	aerial imagery ultra-high spatial resolution orbital imagery object detection YOLOv4 vision transformer deep learning
url	https://www.mdpi.com/1424-8220/23/5/2522
work_keys_str_mv	AT yihengwu yolov4withdeformableembeddingtransformerfeatureextractorforexactobjectdetectioninaerialimagery AT jianjunli yolov4withdeformableembeddingtransformerfeatureextractorforexactobjectdetectioninaerialimagery

YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery

Similar Items