Extended performance analysis of deep-learning algorithms for mice vocalization segmentation
Abstract Ultrasonic vocalizations (USVs) analysis represents a fundamental tool to study animal communication. It can be used to perform a behavioral investigation of mice for ethological studies and in the field of neuroscience and neuropharmacology. The USVs are usually recorded with a microphone...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-07-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-023-38186-7 |
_version_ | 1797778937082806272 |
---|---|
author | Daniele Baggi Marika Premoli Alessandro Gnutti Sara Anna Bonini Riccardo Leonardi Maurizio Memo Pierangelo Migliorati |
author_facet | Daniele Baggi Marika Premoli Alessandro Gnutti Sara Anna Bonini Riccardo Leonardi Maurizio Memo Pierangelo Migliorati |
author_sort | Daniele Baggi |
collection | DOAJ |
description | Abstract Ultrasonic vocalizations (USVs) analysis represents a fundamental tool to study animal communication. It can be used to perform a behavioral investigation of mice for ethological studies and in the field of neuroscience and neuropharmacology. The USVs are usually recorded with a microphone sensitive to ultrasound frequencies and then processed by specific software, which help the operator to identify and characterize different families of calls. Recently, many automated systems have been proposed for automatically performing both the detection and the classification of the USVs. Of course, the USV segmentation represents the crucial step for the general framework, since the quality of the call processing strictly depends on how accurately the call itself has been previously detected. In this paper, we investigate the performance of three supervised deep learning methods for automated USV segmentation: an Auto-Encoder Neural Network (AE), a U-NET Neural Network (UNET) and a Recurrent Neural Network (RNN). The proposed models receive as input the spectrogram associated with the recorded audio track and return as output the regions in which the USV calls have been detected. To evaluate the performance of the models, we have built a dataset by recording several audio tracks and manually segmenting the corresponding USV spectrograms generated with the Avisoft software, producing in this way the ground-truth (GT) used for training. All three proposed architectures demonstrated precision and recall scores exceeding $$90\%$$ 90 % , with UNET and AE achieving values above $$95\%$$ 95 % , surpassing other state-of-the-art methods that were considered for comparison in this study. Additionally, the evaluation was extended to an external dataset, where UNET once again exhibited the highest performance. We suggest that our experimental results may represent a valuable benchmark for future works. |
first_indexed | 2024-03-12T23:24:43Z |
format | Article |
id | doaj.art-eeec51018a1d4586ba6a21825a3449b2 |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-03-12T23:24:43Z |
publishDate | 2023-07-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-eeec51018a1d4586ba6a21825a3449b22023-07-16T11:15:14ZengNature PortfolioScientific Reports2045-23222023-07-0113111410.1038/s41598-023-38186-7Extended performance analysis of deep-learning algorithms for mice vocalization segmentationDaniele Baggi0Marika Premoli1Alessandro Gnutti2Sara Anna Bonini3Riccardo Leonardi4Maurizio Memo5Pierangelo Migliorati6Department of Information Engineering, University of BresciaDepartment of Molecular and Translational Medicine, University of BresciaDepartment of Information Engineering, University of BresciaDepartment of Molecular and Translational Medicine, University of BresciaDepartment of Information Engineering, University of BresciaDepartment of Molecular and Translational Medicine, University of BresciaDepartment of Information Engineering, University of BresciaAbstract Ultrasonic vocalizations (USVs) analysis represents a fundamental tool to study animal communication. It can be used to perform a behavioral investigation of mice for ethological studies and in the field of neuroscience and neuropharmacology. The USVs are usually recorded with a microphone sensitive to ultrasound frequencies and then processed by specific software, which help the operator to identify and characterize different families of calls. Recently, many automated systems have been proposed for automatically performing both the detection and the classification of the USVs. Of course, the USV segmentation represents the crucial step for the general framework, since the quality of the call processing strictly depends on how accurately the call itself has been previously detected. In this paper, we investigate the performance of three supervised deep learning methods for automated USV segmentation: an Auto-Encoder Neural Network (AE), a U-NET Neural Network (UNET) and a Recurrent Neural Network (RNN). The proposed models receive as input the spectrogram associated with the recorded audio track and return as output the regions in which the USV calls have been detected. To evaluate the performance of the models, we have built a dataset by recording several audio tracks and manually segmenting the corresponding USV spectrograms generated with the Avisoft software, producing in this way the ground-truth (GT) used for training. All three proposed architectures demonstrated precision and recall scores exceeding $$90\%$$ 90 % , with UNET and AE achieving values above $$95\%$$ 95 % , surpassing other state-of-the-art methods that were considered for comparison in this study. Additionally, the evaluation was extended to an external dataset, where UNET once again exhibited the highest performance. We suggest that our experimental results may represent a valuable benchmark for future works.https://doi.org/10.1038/s41598-023-38186-7 |
spellingShingle | Daniele Baggi Marika Premoli Alessandro Gnutti Sara Anna Bonini Riccardo Leonardi Maurizio Memo Pierangelo Migliorati Extended performance analysis of deep-learning algorithms for mice vocalization segmentation Scientific Reports |
title | Extended performance analysis of deep-learning algorithms for mice vocalization segmentation |
title_full | Extended performance analysis of deep-learning algorithms for mice vocalization segmentation |
title_fullStr | Extended performance analysis of deep-learning algorithms for mice vocalization segmentation |
title_full_unstemmed | Extended performance analysis of deep-learning algorithms for mice vocalization segmentation |
title_short | Extended performance analysis of deep-learning algorithms for mice vocalization segmentation |
title_sort | extended performance analysis of deep learning algorithms for mice vocalization segmentation |
url | https://doi.org/10.1038/s41598-023-38186-7 |
work_keys_str_mv | AT danielebaggi extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation AT marikapremoli extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation AT alessandrognutti extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation AT saraannabonini extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation AT riccardoleonardi extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation AT mauriziomemo extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation AT pierangelomigliorati extendedperformanceanalysisofdeeplearningalgorithmsformicevocalizationsegmentation |