LRTransDet: A Real-Time SAR Ship-Detection Network with Lightweight ViT and Multi-Scale Feature Fusion
In recent years, significant strides have been made in the field of synthetic aperture radar (SAR) ship detection through the application of deep learning techniques. These advanced methods have substantially improved the accuracy of ship detection. Nonetheless, SAR images present distinct challenge...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-11-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/15/22/5309 |
_version_ | 1797457935992881152 |
---|---|
author | Kunyu Feng Li Lun Xiaofeng Wang Xiaoxin Cui |
author_facet | Kunyu Feng Li Lun Xiaofeng Wang Xiaoxin Cui |
author_sort | Kunyu Feng |
collection | DOAJ |
description | In recent years, significant strides have been made in the field of synthetic aperture radar (SAR) ship detection through the application of deep learning techniques. These advanced methods have substantially improved the accuracy of ship detection. Nonetheless, SAR images present distinct challenges, including complex backgrounds, small ship targets, and noise interference, thereby rendering the detectors particularly demanding. In this paper, we introduce LRTransDet, a real-time SAR ship detector. LRTransDet leverages a lightweight vision transformer (ViT) and a multi-scale feature fusion neck to address these challenges effectively. First, our model implements a lightweight backbone that combines convolutional neural networks (CNNs) and transformers, thus enabling it to simultaneously capture both local and global features from input SAR images. Moreover, we boost the model’s efficiency by incorporating the faster weighted feature fusion (Faster-WF2) module and coordinate attention (CA) mechanism within the feature fusion neck. These components optimize computational resources while maintaining the model’s performance. To overcome the challenge of detecting small ship targets in SAR images, we refine the original loss function and use the normalized Wasserstein distance (NWD) metric and the intersection over union (IoU) scheme. This combination improves the detector’s ability to efficiently detect small targets. To prove the performance of our proposed model, we conducted experiments on four challenging datasets (the SSDD, the SAR-Ship Dataset, the HRSID, and the LS-SSDD-v1.0). The results demonstrate that our model surpasses both general object detectors and state-of-the-art SAR ship detectors in terms of detection accuracy (97.8% on the SSDD and 93.9% on the HRSID) and speed (74.6 FPS on the SSDD and 75.8 FPS on the HRSID), all while demanding 3.07 M parameters. Additionally, we conducted a series of ablation experiments to illustrate the impact of the EfficientViT, the Faster-WF2 module, the CA mechanism, and the NWD metric on multi-scale feature fusion and detection performance. |
first_indexed | 2024-03-09T16:29:55Z |
format | Article |
id | doaj.art-ef8215a8d85a4116bc5e27e784aedebe |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-03-09T16:29:55Z |
publishDate | 2023-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-ef8215a8d85a4116bc5e27e784aedebe2023-11-24T15:04:19ZengMDPI AGRemote Sensing2072-42922023-11-011522530910.3390/rs15225309LRTransDet: A Real-Time SAR Ship-Detection Network with Lightweight ViT and Multi-Scale Feature FusionKunyu Feng0Li Lun1Xiaofeng Wang2Xiaoxin Cui3School of Software and Microeletronics, Peking University, Beijing 102600, ChinaSchool of Integrated Circuits, Peking University, Beijing 100871, ChinaBeijing Aerospace Automatic Control Institute, Beijing 100039, ChinaSchool of Integrated Circuits, Peking University, Beijing 100871, ChinaIn recent years, significant strides have been made in the field of synthetic aperture radar (SAR) ship detection through the application of deep learning techniques. These advanced methods have substantially improved the accuracy of ship detection. Nonetheless, SAR images present distinct challenges, including complex backgrounds, small ship targets, and noise interference, thereby rendering the detectors particularly demanding. In this paper, we introduce LRTransDet, a real-time SAR ship detector. LRTransDet leverages a lightweight vision transformer (ViT) and a multi-scale feature fusion neck to address these challenges effectively. First, our model implements a lightweight backbone that combines convolutional neural networks (CNNs) and transformers, thus enabling it to simultaneously capture both local and global features from input SAR images. Moreover, we boost the model’s efficiency by incorporating the faster weighted feature fusion (Faster-WF2) module and coordinate attention (CA) mechanism within the feature fusion neck. These components optimize computational resources while maintaining the model’s performance. To overcome the challenge of detecting small ship targets in SAR images, we refine the original loss function and use the normalized Wasserstein distance (NWD) metric and the intersection over union (IoU) scheme. This combination improves the detector’s ability to efficiently detect small targets. To prove the performance of our proposed model, we conducted experiments on four challenging datasets (the SSDD, the SAR-Ship Dataset, the HRSID, and the LS-SSDD-v1.0). The results demonstrate that our model surpasses both general object detectors and state-of-the-art SAR ship detectors in terms of detection accuracy (97.8% on the SSDD and 93.9% on the HRSID) and speed (74.6 FPS on the SSDD and 75.8 FPS on the HRSID), all while demanding 3.07 M parameters. Additionally, we conducted a series of ablation experiments to illustrate the impact of the EfficientViT, the Faster-WF2 module, the CA mechanism, and the NWD metric on multi-scale feature fusion and detection performance.https://www.mdpi.com/2072-4292/15/22/5309synthetic aperture radar (SAR)ship detectionvision transformer (ViT)faster weighted feature fusion (Faster-WF2)coordinate attention (CA)real-time |
spellingShingle | Kunyu Feng Li Lun Xiaofeng Wang Xiaoxin Cui LRTransDet: A Real-Time SAR Ship-Detection Network with Lightweight ViT and Multi-Scale Feature Fusion Remote Sensing synthetic aperture radar (SAR) ship detection vision transformer (ViT) faster weighted feature fusion (Faster-WF2) coordinate attention (CA) real-time |
title | LRTransDet: A Real-Time SAR Ship-Detection Network with Lightweight ViT and Multi-Scale Feature Fusion |
title_full | LRTransDet: A Real-Time SAR Ship-Detection Network with Lightweight ViT and Multi-Scale Feature Fusion |
title_fullStr | LRTransDet: A Real-Time SAR Ship-Detection Network with Lightweight ViT and Multi-Scale Feature Fusion |
title_full_unstemmed | LRTransDet: A Real-Time SAR Ship-Detection Network with Lightweight ViT and Multi-Scale Feature Fusion |
title_short | LRTransDet: A Real-Time SAR Ship-Detection Network with Lightweight ViT and Multi-Scale Feature Fusion |
title_sort | lrtransdet a real time sar ship detection network with lightweight vit and multi scale feature fusion |
topic | synthetic aperture radar (SAR) ship detection vision transformer (ViT) faster weighted feature fusion (Faster-WF2) coordinate attention (CA) real-time |
url | https://www.mdpi.com/2072-4292/15/22/5309 |
work_keys_str_mv | AT kunyufeng lrtransdetarealtimesarshipdetectionnetworkwithlightweightvitandmultiscalefeaturefusion AT lilun lrtransdetarealtimesarshipdetectionnetworkwithlightweightvitandmultiscalefeaturefusion AT xiaofengwang lrtransdetarealtimesarshipdetectionnetworkwithlightweightvitandmultiscalefeaturefusion AT xiaoxincui lrtransdetarealtimesarshipdetectionnetworkwithlightweightvitandmultiscalefeaturefusion |