YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery
The deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequently fail to produce satisfactory results when appli...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/23/5/2522 |
_version_ | 1827752105034645504 |
---|---|
author | Yiheng Wu Jianjun Li |
author_facet | Yiheng Wu Jianjun Li |
author_sort | Yiheng Wu |
collection | DOAJ |
description | The deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequently fail to produce satisfactory results when applied to aerial images. To address these problems, we proposed the DET-YOLO enhancement based on YOLOv4. Initially, we employed a vision transformer to acquire highly effective global information extraction capabilities. In the transformer, we proposed deformable embedding instead of linear embedding and a full convolution feedforward network (FCFN) instead of a feedforward network in order to reduce the feature loss caused by cutting in the embedding process and improve the spatial feature extraction capability. Second, for improved multiscale feature fusion in the neck, we employed a depth direction separable deformable pyramid module (DSDP) rather than a feature pyramid network. Experiments on the DOTA, RSOD, and UCAS-AOD datasets demonstrated that our method’s average accuracy (mAP) values reached 0.728, 0.952, and 0.945, respectively, which were comparable to the existing state-of-the-art methods. |
first_indexed | 2024-03-11T07:10:05Z |
format | Article |
id | doaj.art-060b29a61eba458780deb97b2c7413c3 |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-11T07:10:05Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-060b29a61eba458780deb97b2c7413c32023-11-17T08:36:00ZengMDPI AGSensors1424-82202023-02-01235252210.3390/s23052522YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial ImageryYiheng Wu0Jianjun Li1College of Computer and Information Engineering, Central South University of Forestry and Technology University, Changsha 410004, ChinaCollege of Computer and Information Engineering, Central South University of Forestry and Technology University, Changsha 410004, ChinaThe deep learning method for natural-image object detection tasks has made tremendous progress in recent decades. However, due to multiscale targets, complex backgrounds, and high-scale small targets, methods from the field of natural images frequently fail to produce satisfactory results when applied to aerial images. To address these problems, we proposed the DET-YOLO enhancement based on YOLOv4. Initially, we employed a vision transformer to acquire highly effective global information extraction capabilities. In the transformer, we proposed deformable embedding instead of linear embedding and a full convolution feedforward network (FCFN) instead of a feedforward network in order to reduce the feature loss caused by cutting in the embedding process and improve the spatial feature extraction capability. Second, for improved multiscale feature fusion in the neck, we employed a depth direction separable deformable pyramid module (DSDP) rather than a feature pyramid network. Experiments on the DOTA, RSOD, and UCAS-AOD datasets demonstrated that our method’s average accuracy (mAP) values reached 0.728, 0.952, and 0.945, respectively, which were comparable to the existing state-of-the-art methods.https://www.mdpi.com/1424-8220/23/5/2522aerial imageryultra-high spatial resolution orbital imageryobject detectionYOLOv4vision transformerdeep learning |
spellingShingle | Yiheng Wu Jianjun Li YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery Sensors aerial imagery ultra-high spatial resolution orbital imagery object detection YOLOv4 vision transformer deep learning |
title | YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery |
title_full | YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery |
title_fullStr | YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery |
title_full_unstemmed | YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery |
title_short | YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery |
title_sort | yolov4 with deformable embedding transformer feature extractor for exact object detection in aerial imagery |
topic | aerial imagery ultra-high spatial resolution orbital imagery object detection YOLOv4 vision transformer deep learning |
url | https://www.mdpi.com/1424-8220/23/5/2522 |
work_keys_str_mv | AT yihengwu yolov4withdeformableembeddingtransformerfeatureextractorforexactobjectdetectioninaerialimagery AT jianjunli yolov4withdeformableembeddingtransformerfeatureextractorforexactobjectdetectioninaerialimagery |