DP-ViT: A Dual-Path Vision Transformer for Real-Time Sonar Target Detection
Sonar image is the main way for underwater vehicles to obtain environmental information. The task of target detection in sonar images can distinguish multi-class targets in real time and accurately locate them, providing perception information for the decision-making system of underwater vehicles. H...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-11-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/14/22/5807 |
_version_ | 1797464045863829504 |
---|---|
author | Yushan Sun Haotian Zheng Guocheng Zhang Jingfei Ren Hao Xu Chao Xu |
author_facet | Yushan Sun Haotian Zheng Guocheng Zhang Jingfei Ren Hao Xu Chao Xu |
author_sort | Yushan Sun |
collection | DOAJ |
description | Sonar image is the main way for underwater vehicles to obtain environmental information. The task of target detection in sonar images can distinguish multi-class targets in real time and accurately locate them, providing perception information for the decision-making system of underwater vehicles. However, there are many challenges in sonar image target detection, such as many kinds of sonar, complex and serious noise interference in images, and less datasets. This paper proposes a sonar image target detection method based on Dual Path Vision Transformer Network (DP-VIT) to accurately detect targets in forward-look sonar and side-scan sonar. DP-ViT increases receptive field by adding multi-scale to patch embedding enhances learning ability of model feature extraction by using Dual Path Transformer Block, then introduces Conv-Attention to reduce model training parameters, and finally uses Generalized Focal Loss to solve the problem of imbalance between positive and negative samples. The experimental results show that the performance of this sonar target detection method is superior to other mainstream methods on both forward-look sonar dataset and side-scan sonar dataset, and it can also maintain good performance in the case of adding noise. |
first_indexed | 2024-03-09T18:02:21Z |
format | Article |
id | doaj.art-c3ec73609f7f4a72ba340496f13aa5c0 |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-03-09T18:02:21Z |
publishDate | 2022-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-c3ec73609f7f4a72ba340496f13aa5c02023-11-24T09:50:49ZengMDPI AGRemote Sensing2072-42922022-11-011422580710.3390/rs14225807DP-ViT: A Dual-Path Vision Transformer for Real-Time Sonar Target DetectionYushan Sun0Haotian Zheng1Guocheng Zhang2Jingfei Ren3Hao Xu4Chao Xu5Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, ChinaScience and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, ChinaScience and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, ChinaCollege of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, ChinaMarine Design and Research Institute of China, Shanghai 200011, ChinaCollege of Underwater Acoustic Engineering, Harbin Engineering University, Harbin 150001, ChinaSonar image is the main way for underwater vehicles to obtain environmental information. The task of target detection in sonar images can distinguish multi-class targets in real time and accurately locate them, providing perception information for the decision-making system of underwater vehicles. However, there are many challenges in sonar image target detection, such as many kinds of sonar, complex and serious noise interference in images, and less datasets. This paper proposes a sonar image target detection method based on Dual Path Vision Transformer Network (DP-VIT) to accurately detect targets in forward-look sonar and side-scan sonar. DP-ViT increases receptive field by adding multi-scale to patch embedding enhances learning ability of model feature extraction by using Dual Path Transformer Block, then introduces Conv-Attention to reduce model training parameters, and finally uses Generalized Focal Loss to solve the problem of imbalance between positive and negative samples. The experimental results show that the performance of this sonar target detection method is superior to other mainstream methods on both forward-look sonar dataset and side-scan sonar dataset, and it can also maintain good performance in the case of adding noise.https://www.mdpi.com/2072-4292/14/22/5807sonar target detectionvision transformertransformerconvolutional neural networkAUV environment awareness |
spellingShingle | Yushan Sun Haotian Zheng Guocheng Zhang Jingfei Ren Hao Xu Chao Xu DP-ViT: A Dual-Path Vision Transformer for Real-Time Sonar Target Detection Remote Sensing sonar target detection vision transformer transformer convolutional neural network AUV environment awareness |
title | DP-ViT: A Dual-Path Vision Transformer for Real-Time Sonar Target Detection |
title_full | DP-ViT: A Dual-Path Vision Transformer for Real-Time Sonar Target Detection |
title_fullStr | DP-ViT: A Dual-Path Vision Transformer for Real-Time Sonar Target Detection |
title_full_unstemmed | DP-ViT: A Dual-Path Vision Transformer for Real-Time Sonar Target Detection |
title_short | DP-ViT: A Dual-Path Vision Transformer for Real-Time Sonar Target Detection |
title_sort | dp vit a dual path vision transformer for real time sonar target detection |
topic | sonar target detection vision transformer transformer convolutional neural network AUV environment awareness |
url | https://www.mdpi.com/2072-4292/14/22/5807 |
work_keys_str_mv | AT yushansun dpvitadualpathvisiontransformerforrealtimesonartargetdetection AT haotianzheng dpvitadualpathvisiontransformerforrealtimesonartargetdetection AT guochengzhang dpvitadualpathvisiontransformerforrealtimesonartargetdetection AT jingfeiren dpvitadualpathvisiontransformerforrealtimesonartargetdetection AT haoxu dpvitadualpathvisiontransformerforrealtimesonartargetdetection AT chaoxu dpvitadualpathvisiontransformerforrealtimesonartargetdetection |