Recognition and location of marine animal sounds using two-stream ConvNet with attention

There are abundant resources and many endangered marine animals in the ocean. Using sound to effectively identify and locate them, and estimate their distribution area, has a very important role in the study of the complex diversity of marine animals (Hanny et al., 2013). We design a Two-Stream Conv...

Full description

Bibliographic Details
Main Authors: Shaoxiang Hu, Rong Hou, Zhiwu Liao, Peng Chen
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-06-01
Series:Frontiers in Marine Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmars.2023.1059622/full
_version_ 1797813642092085248
author Shaoxiang Hu
Rong Hou
Zhiwu Liao
Peng Chen
author_facet Shaoxiang Hu
Rong Hou
Zhiwu Liao
Peng Chen
author_sort Shaoxiang Hu
collection DOAJ
description There are abundant resources and many endangered marine animals in the ocean. Using sound to effectively identify and locate them, and estimate their distribution area, has a very important role in the study of the complex diversity of marine animals (Hanny et al., 2013). We design a Two-Stream ConvNet with Attention (TSCA) model, which is a two-stream model combined with attention, in which one branch processes the temporal signal and the other branch processes the frequency domain signal; It makes good use of the characteristics of high time resolution of time domain signal and high recognition rate of frequency domain signal features of sound, and it realizes rapid localization and recognition of sound of marine species. The basic network architecture of the model is YOLO (You Only Look Once) (Joseph et al., 2016). A new loss function focal loss is constructed to strengthen the impact on the tail class of the sample, overcome the problem of data imbalance and avoid over fitting. At the same time, the attention module is constructed to focus on more detailed sound features, so as to improve the noise resistance of the model and achieve high-precision marine species identification and location. In The Watkins Marine Mammal Sound Database, the recognition rate of the algorithm reached 92.04% and the positioning accuracy reached 78.4%.The experimental results show that the algorithm has good robustness, high recognition accuracy and positioning accuracy.
first_indexed 2024-03-13T07:55:47Z
format Article
id doaj.art-15958b0c8ea34edcbddc7fc4d784b702
institution Directory Open Access Journal
issn 2296-7745
language English
last_indexed 2024-03-13T07:55:47Z
publishDate 2023-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Marine Science
spelling doaj.art-15958b0c8ea34edcbddc7fc4d784b7022023-06-02T05:28:32ZengFrontiers Media S.A.Frontiers in Marine Science2296-77452023-06-011010.3389/fmars.2023.10596221059622Recognition and location of marine animal sounds using two-stream ConvNet with attentionShaoxiang Hu0Rong Hou1Zhiwu Liao2Peng Chen3School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, ChinaChengdu Research Base of Giant Panda Breeding, Sichuan Key Laboratory of Conservation Biology for Endangered Wildlife, Chengdu, ChinaAcademy of Global Governance and Area Studies, Sichuan Normal University, Chengdu, ChinaChengdu Research Base of Giant Panda Breeding, Sichuan Key Laboratory of Conservation Biology for Endangered Wildlife, Chengdu, ChinaThere are abundant resources and many endangered marine animals in the ocean. Using sound to effectively identify and locate them, and estimate their distribution area, has a very important role in the study of the complex diversity of marine animals (Hanny et al., 2013). We design a Two-Stream ConvNet with Attention (TSCA) model, which is a two-stream model combined with attention, in which one branch processes the temporal signal and the other branch processes the frequency domain signal; It makes good use of the characteristics of high time resolution of time domain signal and high recognition rate of frequency domain signal features of sound, and it realizes rapid localization and recognition of sound of marine species. The basic network architecture of the model is YOLO (You Only Look Once) (Joseph et al., 2016). A new loss function focal loss is constructed to strengthen the impact on the tail class of the sample, overcome the problem of data imbalance and avoid over fitting. At the same time, the attention module is constructed to focus on more detailed sound features, so as to improve the noise resistance of the model and achieve high-precision marine species identification and location. In The Watkins Marine Mammal Sound Database, the recognition rate of the algorithm reached 92.04% and the positioning accuracy reached 78.4%.The experimental results show that the algorithm has good robustness, high recognition accuracy and positioning accuracy.https://www.frontiersin.org/articles/10.3389/fmars.2023.1059622/fullvoice recognitionlocationtwo-stream ConvNetYOLOattentionCMFCC
spellingShingle Shaoxiang Hu
Rong Hou
Zhiwu Liao
Peng Chen
Recognition and location of marine animal sounds using two-stream ConvNet with attention
Frontiers in Marine Science
voice recognition
location
two-stream ConvNet
YOLO
attention
CMFCC
title Recognition and location of marine animal sounds using two-stream ConvNet with attention
title_full Recognition and location of marine animal sounds using two-stream ConvNet with attention
title_fullStr Recognition and location of marine animal sounds using two-stream ConvNet with attention
title_full_unstemmed Recognition and location of marine animal sounds using two-stream ConvNet with attention
title_short Recognition and location of marine animal sounds using two-stream ConvNet with attention
title_sort recognition and location of marine animal sounds using two stream convnet with attention
topic voice recognition
location
two-stream ConvNet
YOLO
attention
CMFCC
url https://www.frontiersin.org/articles/10.3389/fmars.2023.1059622/full
work_keys_str_mv AT shaoxianghu recognitionandlocationofmarineanimalsoundsusingtwostreamconvnetwithattention
AT ronghou recognitionandlocationofmarineanimalsoundsusingtwostreamconvnetwithattention
AT zhiwuliao recognitionandlocationofmarineanimalsoundsusingtwostreamconvnetwithattention
AT pengchen recognitionandlocationofmarineanimalsoundsusingtwostreamconvnetwithattention