Summary: | Due to the limitations of small targets in remote sensing images, such as background noise, poor information, and so on, the results of commonly used detection algorithms in small target detection is not satisfactory. To improve the accuracy of detection results, we develop an improved algorithm based on YOLOv8, called LAR-YOLOv8. First, in the feature extraction network, the local module is enhanced by using the dual-branch architecture attention mechanism, while the vision transformer block is used to maximize the representation of the feature map. Second, an attention-guided bidirectional feature pyramid network is designed to generate more discriminative information by efficiently extracting feature from the shallow network through a dynamic sparse attention mechanism, and adding top–down paths to guide the subsequent network modules for feature fusion. Finally, the RIOU loss function is proposed to avoid the failure of the loss function and improve the shape consistency between the predicted and ground-truth box. Experimental results on NWPU VHR-10, RSOD, and CARPK datasets verify that LAR-YOLOv8 achieves satisfactory results in terms of mAP (small), mAP, model parameters, and FPS, and can prove that our modifications made to the original YOLOv8 model are effective.
|