YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery

The deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequently fail to produce satisfactory results when appli...

Full description

Bibliographic Details
Main Authors: Yiheng Wu, Jianjun Li
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/5/2522
_version_ 1827752105034645504
author Yiheng Wu
Jianjun Li
author_facet Yiheng Wu
Jianjun Li
author_sort Yiheng Wu
collection DOAJ
description The deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequently fail to produce satisfactory results when applied to aerial images. To address these problems, we proposed the DET-YOLO enhancement based on YOLOv4. Initially, we employed a vision transformer to acquire highly effective global information extraction capabilities. In the transformer, we proposed deformable embedding instead of linear embedding and a full convolution feedforward network (FCFN) instead of a feedforward network in order to reduce the feature loss caused by cutting in the embedding process and improve the spatial feature extraction capability. Second, for improved multiscale feature fusion in the neck, we employed a depth direction separable deformable pyramid module (DSDP) rather than a feature pyramid network. Experiments on the DOTA, RSOD, and UCAS-AOD datasets demonstrated that our method’s average accuracy (mAP) values reached 0.728, 0.952, and 0.945, respectively, which were comparable to the existing state-of-the-art methods.
first_indexed 2024-03-11T07:10:05Z
format Article
id doaj.art-060b29a61eba458780deb97b2c7413c3
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-11T07:10:05Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-060b29a61eba458780deb97b2c7413c32023-11-17T08:36:00ZengMDPI AGSensors1424-82202023-02-01235252210.3390/s23052522YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial ImageryYiheng Wu0Jianjun Li1College of Computer and Information Engineering, Central South University of Forestry and Technology University, Changsha 410004, ChinaCollege of Computer and Information Engineering, Central South University of Forestry and Technology University, Changsha 410004, ChinaThe deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequently fail to produce satisfactory results when applied to aerial images. To address these problems, we proposed the DET-YOLO enhancement based on YOLOv4. Initially, we employed a vision transformer to acquire highly effective global information extraction capabilities. In the transformer, we proposed deformable embedding instead of linear embedding and a full convolution feedforward network (FCFN) instead of a feedforward network in order to reduce the feature loss caused by cutting in the embedding process and improve the spatial feature extraction capability. Second, for improved multiscale feature fusion in the neck, we employed a depth direction separable deformable pyramid module (DSDP) rather than a feature pyramid network. Experiments on the DOTA, RSOD, and UCAS-AOD datasets demonstrated that our method’s average accuracy (mAP) values reached 0.728, 0.952, and 0.945, respectively, which were comparable to the existing state-of-the-art methods.https://www.mdpi.com/1424-8220/23/5/2522aerial imageryultra-high spatial resolution orbital imageryobject detectionYOLOv4vision transformerdeep learning
spellingShingle Yiheng Wu
Jianjun Li
YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
Sensors
aerial imagery
ultra-high spatial resolution orbital imagery
object detection
YOLOv4
vision transformer
deep learning
title YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
title_full YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
title_fullStr YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
title_full_unstemmed YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
title_short YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
title_sort yolov4 with deformable embedding transformer feature extractor for exact object detection in aerial imagery
topic aerial imagery
ultra-high spatial resolution orbital imagery
object detection
YOLOv4
vision transformer
deep learning
url https://www.mdpi.com/1424-8220/23/5/2522
work_keys_str_mv AT yihengwu yolov4withdeformableembeddingtransformerfeatureextractorforexactobjectdetectioninaerialimagery
AT jianjunli yolov4withdeformableembeddingtransformerfeatureextractorforexactobjectdetectioninaerialimagery