Sound Event Detection for Human Safety and Security in Noisy Environments

The objective of a sound event detector is to recognize anomalies in an audio clip and return their onset and offset. However, detecting sound events in noisy environments is a challenging task. This is due to the fact that in a real audio signal several sound sources co-exist. Moreover, the charact...

Full description

Bibliographic Details
Main Authors: Michael Neri, Federica Battisti, Alessandro Neri, Marco Carli
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9997485/
_version_ 1797974449194008576
author Michael Neri
Federica Battisti
Alessandro Neri
Marco Carli
author_facet Michael Neri
Federica Battisti
Alessandro Neri
Marco Carli
author_sort Michael Neri
collection DOAJ
description The objective of a sound event detector is to recognize anomalies in an audio clip and return their onset and offset. However, detecting sound events in noisy environments is a challenging task. This is due to the fact that in a real audio signal several sound sources co-exist. Moreover, the characteristics of polyphonic audios are different from isolated recordings. It is also necessary to consider the presence of noise (e.g. thermal and environmental). In this contribution, we present a sound anomaly detection system based on a fully convolutional network which exploits image spatial filtering and an Atrous Spatial Pyramid Pooling module. To cope with the lack of datasets specifically designed for sound event detection, a dataset for the specific application of noisy bus environments has been designed. The dataset has been obtained by mixing background audio files, recorded in a real environment, with anomalous events extracted from monophonic collections of labelled audios. The performances of the proposed system have been evaluated through segment-based metrics such as error rate, recall, and F1-Score. Moreover, robustness and precision have been evaluated through four different tests. The analysis of the results shows that the proposed sound event detector outperforms both state-of-the-art methods and general purpose deep learning-solutions.
first_indexed 2024-04-11T04:20:07Z
format Article
id doaj.art-faf721939a0c4e9e839eb46ec39f2b5f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T04:20:07Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-faf721939a0c4e9e839eb46ec39f2b5f2022-12-31T00:00:25ZengIEEEIEEE Access2169-35362022-01-011013423013424010.1109/ACCESS.2022.32316819997485Sound Event Detection for Human Safety and Security in Noisy EnvironmentsMichael Neri0https://orcid.org/0000-0002-6212-9139Federica Battisti1https://orcid.org/0000-0002-0846-5879Alessandro Neri2https://orcid.org/0000-0002-5911-9490Marco Carli3https://orcid.org/0000-0002-7489-3767Department of Industrial, Electronic and Mechanical Engineering, Roma Tre University, Rome, ItalyDepartment of Information Engineering, University of Padova, Padua, ItalyDepartment of Industrial, Electronic and Mechanical Engineering, Roma Tre University, Rome, ItalyDepartment of Industrial, Electronic and Mechanical Engineering, Roma Tre University, Rome, ItalyThe objective of a sound event detector is to recognize anomalies in an audio clip and return their onset and offset. However, detecting sound events in noisy environments is a challenging task. This is due to the fact that in a real audio signal several sound sources co-exist. Moreover, the characteristics of polyphonic audios are different from isolated recordings. It is also necessary to consider the presence of noise (e.g. thermal and environmental). In this contribution, we present a sound anomaly detection system based on a fully convolutional network which exploits image spatial filtering and an Atrous Spatial Pyramid Pooling module. To cope with the lack of datasets specifically designed for sound event detection, a dataset for the specific application of noisy bus environments has been designed. The dataset has been obtained by mixing background audio files, recorded in a real environment, with anomalous events extracted from monophonic collections of labelled audios. The performances of the proposed system have been evaluated through segment-based metrics such as error rate, recall, and F1-Score. Moreover, robustness and precision have been evaluated through four different tests. The analysis of the results shows that the proposed sound event detector outperforms both state-of-the-art methods and general purpose deep learning-solutions.https://ieeexplore.ieee.org/document/9997485/Audio processingdeep learninghuman safetysound event detectionspatial filters
spellingShingle Michael Neri
Federica Battisti
Alessandro Neri
Marco Carli
Sound Event Detection for Human Safety and Security in Noisy Environments
IEEE Access
Audio processing
deep learning
human safety
sound event detection
spatial filters
title Sound Event Detection for Human Safety and Security in Noisy Environments
title_full Sound Event Detection for Human Safety and Security in Noisy Environments
title_fullStr Sound Event Detection for Human Safety and Security in Noisy Environments
title_full_unstemmed Sound Event Detection for Human Safety and Security in Noisy Environments
title_short Sound Event Detection for Human Safety and Security in Noisy Environments
title_sort sound event detection for human safety and security in noisy environments
topic Audio processing
deep learning
human safety
sound event detection
spatial filters
url https://ieeexplore.ieee.org/document/9997485/
work_keys_str_mv AT michaelneri soundeventdetectionforhumansafetyandsecurityinnoisyenvironments
AT federicabattisti soundeventdetectionforhumansafetyandsecurityinnoisyenvironments
AT alessandroneri soundeventdetectionforhumansafetyandsecurityinnoisyenvironments
AT marcocarli soundeventdetectionforhumansafetyandsecurityinnoisyenvironments