Extended performance analysis of deep-learning algorithms for mice vocalization segmentation

Abstract Ultrasonic vocalizations (USVs) analysis represents a fundamental tool to study animal communication. It can be used to perform a behavioral investigation of mice for ethological studies and in the field of neuroscience and neuropharmacology. The USVs are usually recorded with a microphone...

Full description

Bibliographic Details
Main Authors: Daniele Baggi, Marika Premoli, Alessandro Gnutti, Sara Anna Bonini, Riccardo Leonardi, Maurizio Memo, Pierangelo Migliorati
Format: Article
Language:English
Published: Nature Portfolio 2023-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-38186-7
_version_ 1797778937082806272
author Daniele Baggi
Marika Premoli
Alessandro Gnutti
Sara Anna Bonini
Riccardo Leonardi
Maurizio Memo
Pierangelo Migliorati
author_facet Daniele Baggi
Marika Premoli
Alessandro Gnutti
Sara Anna Bonini
Riccardo Leonardi
Maurizio Memo
Pierangelo Migliorati
author_sort Daniele Baggi
collection DOAJ
description Abstract Ultrasonic vocalizations (USVs) analysis represents a fundamental tool to study animal communication. It can be used to perform a behavioral investigation of mice for ethological studies and in the field of neuroscience and neuropharmacology. The USVs are usually recorded with a microphone sensitive to ultrasound frequencies and then processed by specific software, which help the operator to identify and characterize different families of calls. Recently, many automated systems have been proposed for automatically performing both the detection and the classification of the USVs. Of course, the USV segmentation represents the crucial step for the general framework, since the quality of the call processing strictly depends on how accurately the call itself has been previously detected. In this paper, we investigate the performance of three supervised deep learning methods for automated USV segmentation: an Auto-Encoder Neural Network (AE), a U-NET Neural Network (UNET) and a Recurrent Neural Network (RNN). The proposed models receive as input the spectrogram associated with the recorded audio track and return as output the regions in which the USV calls have been detected. To evaluate the performance of the models, we have built a dataset by recording several audio tracks and manually segmenting the corresponding USV spectrograms generated with the Avisoft software, producing in this way the ground-truth (GT) used for training. All three proposed architectures demonstrated precision and recall scores exceeding $$90\%$$ 90 % , with UNET and AE achieving values above $$95\%$$ 95 % , surpassing other state-of-the-art methods that were considered for comparison in this study. Additionally, the evaluation was extended to an external dataset, where UNET once again exhibited the highest performance. We suggest that our experimental results may represent a valuable benchmark for future works.
first_indexed 2024-03-12T23:24:43Z
format Article
id doaj.art-eeec51018a1d4586ba6a21825a3449b2
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-12T23:24:43Z
publishDate 2023-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-eeec51018a1d4586ba6a21825a3449b22023-07-16T11:15:14ZengNature PortfolioScientific Reports2045-23222023-07-0113111410.1038/s41598-023-38186-7Extended performance analysis of deep-learning algorithms for mice vocalization segmentationDaniele Baggi0Marika Premoli1Alessandro Gnutti2Sara Anna Bonini3Riccardo Leonardi4Maurizio Memo5Pierangelo Migliorati6Department of Information Engineering, University of BresciaDepartment of Molecular and Translational Medicine, University of BresciaDepartment of Information Engineering, University of BresciaDepartment of Molecular and Translational Medicine, University of BresciaDepartment of Information Engineering, University of BresciaDepartment of Molecular and Translational Medicine, University of BresciaDepartment of Information Engineering, University of BresciaAbstract Ultrasonic vocalizations (USVs) analysis represents a fundamental tool to study animal communication. It can be used to perform a behavioral investigation of mice for ethological studies and in the field of neuroscience and neuropharmacology. The USVs are usually recorded with a microphone sensitive to ultrasound frequencies and then processed by specific software, which help the operator to identify and characterize different families of calls. Recently, many automated systems have been proposed for automatically performing both the detection and the classification of the USVs. Of course, the USV segmentation represents the crucial step for the general framework, since the quality of the call processing strictly depends on how accurately the call itself has been previously detected. In this paper, we investigate the performance of three supervised deep learning methods for automated USV segmentation: an Auto-Encoder Neural Network (AE), a U-NET Neural Network (UNET) and a Recurrent Neural Network (RNN). The proposed models receive as input the spectrogram associated with the recorded audio track and return as output the regions in which the USV calls have been detected. To evaluate the performance of the models, we have built a dataset by recording several audio tracks and manually segmenting the corresponding USV spectrograms generated with the Avisoft software, producing in this way the ground-truth (GT) used for training. All three proposed architectures demonstrated precision and recall scores exceeding $$90\%$$ 90 % , with UNET and AE achieving values above $$95\%$$ 95 % , surpassing other state-of-the-art methods that were considered for comparison in this study. Additionally, the evaluation was extended to an external dataset, where UNET once again exhibited the highest performance. We suggest that our experimental results may represent a valuable benchmark for future works.https://doi.org/10.1038/s41598-023-38186-7
spellingShingle Daniele Baggi
Marika Premoli
Alessandro Gnutti
Sara Anna Bonini
Riccardo Leonardi
Maurizio Memo
Pierangelo Migliorati
Extended performance analysis of deep-learning algorithms for mice vocalization segmentation
Scientific Reports
title Extended performance analysis of deep-learning algorithms for mice vocalization segmentation
title_full Extended performance analysis of deep-learning algorithms for mice vocalization segmentation
title_fullStr Extended performance analysis of deep-learning algorithms for mice vocalization segmentation
title_full_unstemmed Extended performance analysis of deep-learning algorithms for mice vocalization segmentation
title_short Extended performance analysis of deep-learning algorithms for mice vocalization segmentation
title_sort extended performance analysis of deep learning algorithms for mice vocalization segmentation
url https://doi.org/10.1038/s41598-023-38186-7
work_keys_str_mv AT danielebaggi extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation
AT marikapremoli extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation
AT alessandrognutti extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation
AT saraannabonini extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation
AT riccardoleonardi extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation
AT mauriziomemo extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation
AT pierangelomigliorati extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation