Multi-branch stacking remote sensing image target detection based on YOLOv5

Optical remote sensing is crucial in land management, maritime safety, and rescue operations. Currently, high resolution target detection faces the problems including feature loss, false detection, and limited network robustness. To tackle the aforementioned issues, this study introduces a novel MBS...

Full description

Bibliographic Details
Main Authors: Luxuan Bian, Bo Li, Jue Wang, Zijun Gao
Format: Article
Language:English
Published: Elsevier 2023-12-01
Series:Egyptian Journal of Remote Sensing and Space Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1110982323000959
Description
Summary:Optical remote sensing is crucial in land management, maritime safety, and rescue operations. Currently, high resolution target detection faces the problems including feature loss, false detection, and limited network robustness. To tackle the aforementioned issues, this study introduces a novel MBS-NET model, which is built upon the YOLOv5 structure. The proposed model presents a multi-branch stacking module structure to precisely capture deep target feature information. By introducing a dual-channel attention mechanism module (EGCA) at the model's neck, the proposed method ensures discriminative feature acquisition in crowded objects and vast spatial scenes. Prior to network training, introduce an improved data augmentation strategy to accommodate the multi-scale and directional variations in remote sensing objectives for model detection. Experimental results on the large-scale public DIOR dataset demonstrate that the MBS-NET model introduced in this paper displays exceptional performance and remarkable interpretability in remote sensing scenarios at a large scale. MBS-NET model outperforms YOLOv5 and YOLOv7 models by increasing the accuracy by 5% and 2% respectively. In addition, the recall rate and F1 Score index of MBS-NET model is superior to those of other methods, resulting in significant improvement of detection accuracy and robustness in large scenes.
ISSN:1110-9823