ESF-YOLO: an accurate and universal object detector based on neural networks
As an excellent single-stage object detector based on neural networks, YOLOv5 has found extensive applications in the industrial domain; however, it still exhibits certain design limitations. To address these issues, this paper proposes Efficient Scale Fusion YOLO (ESF-YOLO). Firstly, the Multi-Samp...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2024-04-01
|
Series: | Frontiers in Neuroscience |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fnins.2024.1371418/full |
_version_ | 1797219347806027776 |
---|---|
author | Wenguang Tao Xiaotian Wang Tian Yan Zhengzhuo Liu Shizheng Wan |
author_facet | Wenguang Tao Xiaotian Wang Tian Yan Zhengzhuo Liu Shizheng Wan |
author_sort | Wenguang Tao |
collection | DOAJ |
description | As an excellent single-stage object detector based on neural networks, YOLOv5 has found extensive applications in the industrial domain; however, it still exhibits certain design limitations. To address these issues, this paper proposes Efficient Scale Fusion YOLO (ESF-YOLO). Firstly, the Multi-Sampling Conv Module (MSCM) is designed, which enhances the backbone network’s learning capability for low-level features through multi-scale receptive fields and cross-scale feature fusion. Secondly, to tackle occlusion issues, a new Block-wise Channel Attention Module (BCAM) is designed, assigning greater weights to channels corresponding to critical information. Next, a lightweight Decoupled Head (LD-Head) is devised. Additionally, the loss function is redesigned to address asynchrony between labels and confidences, alleviating the imbalance between positive and negative samples during the neural network training. Finally, an adaptive scale factor for Intersection over Union (IoU) calculation is innovatively proposed, adjusting bounding box sizes adaptively to accommodate targets of different sizes in the dataset. Experimental results on the SODA10M and CBIA8K datasets demonstrate that ESF-YOLO increases Average Precision at 0.50 IoU (AP50) by 3.93 and 2.24%, Average Precision at 0.75 IoU (AP75) by 4.77 and 4.85%, and mean Average Precision (mAP) by 4 and 5.39%, respectively, validating the model’s broad applicability. |
first_indexed | 2024-04-24T12:32:12Z |
format | Article |
id | doaj.art-bba4d4bfd4d8401094d6d5c949aad24f |
institution | Directory Open Access Journal |
issn | 1662-453X |
language | English |
last_indexed | 2024-04-24T12:32:12Z |
publishDate | 2024-04-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Neuroscience |
spelling | doaj.art-bba4d4bfd4d8401094d6d5c949aad24f2024-04-08T04:28:43ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2024-04-011810.3389/fnins.2024.13714181371418ESF-YOLO: an accurate and universal object detector based on neural networksWenguang Tao0Xiaotian Wang1Tian Yan2Zhengzhuo Liu3Shizheng Wan4Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, ChinaUnmanned System Research Institute, Northwestern Polytechnical University, Xi’an, ChinaUnmanned System Research Institute, Northwestern Polytechnical University, Xi’an, ChinaUnmanned System Research Institute, Northwestern Polytechnical University, Xi’an, ChinaShanghai Electro-Mechanical Engineering Institute, Shanghai, ChinaAs an excellent single-stage object detector based on neural networks, YOLOv5 has found extensive applications in the industrial domain; however, it still exhibits certain design limitations. To address these issues, this paper proposes Efficient Scale Fusion YOLO (ESF-YOLO). Firstly, the Multi-Sampling Conv Module (MSCM) is designed, which enhances the backbone network’s learning capability for low-level features through multi-scale receptive fields and cross-scale feature fusion. Secondly, to tackle occlusion issues, a new Block-wise Channel Attention Module (BCAM) is designed, assigning greater weights to channels corresponding to critical information. Next, a lightweight Decoupled Head (LD-Head) is devised. Additionally, the loss function is redesigned to address asynchrony between labels and confidences, alleviating the imbalance between positive and negative samples during the neural network training. Finally, an adaptive scale factor for Intersection over Union (IoU) calculation is innovatively proposed, adjusting bounding box sizes adaptively to accommodate targets of different sizes in the dataset. Experimental results on the SODA10M and CBIA8K datasets demonstrate that ESF-YOLO increases Average Precision at 0.50 IoU (AP50) by 3.93 and 2.24%, Average Precision at 0.75 IoU (AP75) by 4.77 and 4.85%, and mean Average Precision (mAP) by 4 and 5.39%, respectively, validating the model’s broad applicability.https://www.frontiersin.org/articles/10.3389/fnins.2024.1371418/fullneural networkobject detectioncross-scale feature fusionattention mechanismlightweight decoupled headdynamic loss function |
spellingShingle | Wenguang Tao Xiaotian Wang Tian Yan Zhengzhuo Liu Shizheng Wan ESF-YOLO: an accurate and universal object detector based on neural networks Frontiers in Neuroscience neural network object detection cross-scale feature fusion attention mechanism lightweight decoupled head dynamic loss function |
title | ESF-YOLO: an accurate and universal object detector based on neural networks |
title_full | ESF-YOLO: an accurate and universal object detector based on neural networks |
title_fullStr | ESF-YOLO: an accurate and universal object detector based on neural networks |
title_full_unstemmed | ESF-YOLO: an accurate and universal object detector based on neural networks |
title_short | ESF-YOLO: an accurate and universal object detector based on neural networks |
title_sort | esf yolo an accurate and universal object detector based on neural networks |
topic | neural network object detection cross-scale feature fusion attention mechanism lightweight decoupled head dynamic loss function |
url | https://www.frontiersin.org/articles/10.3389/fnins.2024.1371418/full |
work_keys_str_mv | AT wenguangtao esfyoloanaccurateanduniversalobjectdetectorbasedonneuralnetworks AT xiaotianwang esfyoloanaccurateanduniversalobjectdetectorbasedonneuralnetworks AT tianyan esfyoloanaccurateanduniversalobjectdetectorbasedonneuralnetworks AT zhengzhuoliu esfyoloanaccurateanduniversalobjectdetectorbasedonneuralnetworks AT shizhengwan esfyoloanaccurateanduniversalobjectdetectorbasedonneuralnetworks |