Sound Event Detection for Human Safety and Security in Noisy Environments
The objective of a sound event detector is to recognize anomalies in an audio clip and return their onset and offset. However, detecting sound events in noisy environments is a challenging task. This is due to the fact that in a real audio signal several sound sources co-exist. Moreover, the charact...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9997485/ |
_version_ | 1797974449194008576 |
---|---|
author | Michael Neri Federica Battisti Alessandro Neri Marco Carli |
author_facet | Michael Neri Federica Battisti Alessandro Neri Marco Carli |
author_sort | Michael Neri |
collection | DOAJ |
description | The objective of a sound event detector is to recognize anomalies in an audio clip and return their onset and offset. However, detecting sound events in noisy environments is a challenging task. This is due to the fact that in a real audio signal several sound sources co-exist. Moreover, the characteristics of polyphonic audios are different from isolated recordings. It is also necessary to consider the presence of noise (e.g. thermal and environmental). In this contribution, we present a sound anomaly detection system based on a fully convolutional network which exploits image spatial filtering and an Atrous Spatial Pyramid Pooling module. To cope with the lack of datasets specifically designed for sound event detection, a dataset for the specific application of noisy bus environments has been designed. The dataset has been obtained by mixing background audio files, recorded in a real environment, with anomalous events extracted from monophonic collections of labelled audios. The performances of the proposed system have been evaluated through segment-based metrics such as error rate, recall, and F1-Score. Moreover, robustness and precision have been evaluated through four different tests. The analysis of the results shows that the proposed sound event detector outperforms both state-of-the-art methods and general purpose deep learning-solutions. |
first_indexed | 2024-04-11T04:20:07Z |
format | Article |
id | doaj.art-faf721939a0c4e9e839eb46ec39f2b5f |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-11T04:20:07Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-faf721939a0c4e9e839eb46ec39f2b5f2022-12-31T00:00:25ZengIEEEIEEE Access2169-35362022-01-011013423013424010.1109/ACCESS.2022.32316819997485Sound Event Detection for Human Safety and Security in Noisy EnvironmentsMichael Neri0https://orcid.org/0000-0002-6212-9139Federica Battisti1https://orcid.org/0000-0002-0846-5879Alessandro Neri2https://orcid.org/0000-0002-5911-9490Marco Carli3https://orcid.org/0000-0002-7489-3767Department of Industrial, Electronic and Mechanical Engineering, Roma Tre University, Rome, ItalyDepartment of Information Engineering, University of Padova, Padua, ItalyDepartment of Industrial, Electronic and Mechanical Engineering, Roma Tre University, Rome, ItalyDepartment of Industrial, Electronic and Mechanical Engineering, Roma Tre University, Rome, ItalyThe objective of a sound event detector is to recognize anomalies in an audio clip and return their onset and offset. However, detecting sound events in noisy environments is a challenging task. This is due to the fact that in a real audio signal several sound sources co-exist. Moreover, the characteristics of polyphonic audios are different from isolated recordings. It is also necessary to consider the presence of noise (e.g. thermal and environmental). In this contribution, we present a sound anomaly detection system based on a fully convolutional network which exploits image spatial filtering and an Atrous Spatial Pyramid Pooling module. To cope with the lack of datasets specifically designed for sound event detection, a dataset for the specific application of noisy bus environments has been designed. The dataset has been obtained by mixing background audio files, recorded in a real environment, with anomalous events extracted from monophonic collections of labelled audios. The performances of the proposed system have been evaluated through segment-based metrics such as error rate, recall, and F1-Score. Moreover, robustness and precision have been evaluated through four different tests. The analysis of the results shows that the proposed sound event detector outperforms both state-of-the-art methods and general purpose deep learning-solutions.https://ieeexplore.ieee.org/document/9997485/Audio processingdeep learninghuman safetysound event detectionspatial filters |
spellingShingle | Michael Neri Federica Battisti Alessandro Neri Marco Carli Sound Event Detection for Human Safety and Security in Noisy Environments IEEE Access Audio processing deep learning human safety sound event detection spatial filters |
title | Sound Event Detection for Human Safety and Security in Noisy Environments |
title_full | Sound Event Detection for Human Safety and Security in Noisy Environments |
title_fullStr | Sound Event Detection for Human Safety and Security in Noisy Environments |
title_full_unstemmed | Sound Event Detection for Human Safety and Security in Noisy Environments |
title_short | Sound Event Detection for Human Safety and Security in Noisy Environments |
title_sort | sound event detection for human safety and security in noisy environments |
topic | Audio processing deep learning human safety sound event detection spatial filters |
url | https://ieeexplore.ieee.org/document/9997485/ |
work_keys_str_mv | AT michaelneri soundeventdetectionforhumansafetyandsecurityinnoisyenvironments AT federicabattisti soundeventdetectionforhumansafetyandsecurityinnoisyenvironments AT alessandroneri soundeventdetectionforhumansafetyandsecurityinnoisyenvironments AT marcocarli soundeventdetectionforhumansafetyandsecurityinnoisyenvironments |