Combined LOFAR and DEMON Spectrums for Simultaneous Underwater Acoustic Object Counting and <i>F</i><sub>0</sub> Estimation

In a typical underwater acoustic target detection mission, we have to estimate the target number (<i>N</i>), perform source separation when <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><m...

Full description

Bibliographic Details
Main Authors: Liming Li, Sanming Song, Xisheng Feng
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Journal of Marine Science and Engineering
Subjects:
Online Access:https://www.mdpi.com/2077-1312/10/10/1565
Description
Summary:In a typical underwater acoustic target detection mission, we have to estimate the target number (<i>N</i>), perform source separation when <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><mo>></mo><mn>1</mn></mrow></semantics></math></inline-formula>, and consequently predict the motion parameters such as fundamental frequency (<i>F</i><sub>0</sub>) from separated noises for each target. Although deep learning methods have been adopted in each task, their successes strongly depend on the feed-in features. In this paper, we evaluate several time-frequency features and propose a universal feature extraction strategy for object counting and <i>F</i><sub>0</sub> estimation simultaneously, with a convolutional recurrent neural network (CRNN) as the backbone. On one hand, LOFAR and DEMON are feasible for low-speed and high-speed analysis, respectively, and are combined (LOFAR + DEMON) to cope with full-condition estimation. On the other hand, a comb filter (COMB) is designed and applied to the combined spectrum for harmonicity enhancement, which will be further streamed into the CRNN for prediction. Experiments show that (1) in the <i>F</i><sub>0</sub> estimation task, feeding the filtered combined feature (LOFAR + DEMON + COMB) into the CRNN achieves an accuracy of 98% in the lake trial dataset, which is superior to LOFAR + COMB (83%) or DEMON + COMB (94%) alone, demonstrating that feature combination is plausible. (2) In a counting task, the prediction accuracy of the combined feature (LOFAR + DEMON, COMB included or excluded) is comparable to the state-of-the-art on simulation dataset and dominates the rest on the lake trial dataset, indicating that LOFAR + DEMON can be used as a common feature for both tasks. (3) The inclusion of COMB accelerates the convergence speed of the <i>F</i><sub>0</sub> estimation task, however, it penalizes the counting task by a depression of 13% on average, partly due to the merging effects brought in by the broadband filtering of COMB.
ISSN:2077-1312