Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations

Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become bet...

Full description

Bibliographic Details
Main Authors:	Ioannis Papadimitriou, Anastasios Vafeiadis, Antonios Lalas, Konstantinos Votis, Dimitrios Tzovaras
Format:	Article
Language:	English
Published:	MDPI AG 2020-09-01
Series:	Electronics
Subjects:	audio surveillance spectrograms CNN SNR multichannel
Online Access:	https://www.mdpi.com/2079-9292/9/10/1593

_version_	1797552261015011328
author	Ioannis Papadimitriou Anastasios Vafeiadis Antonios Lalas Konstantinos Votis Dimitrios Tzovaras
author_facet	Ioannis Papadimitriou Anastasios Vafeiadis Antonios Lalas Konstantinos Votis Dimitrios Tzovaras
author_sort	Ioannis Papadimitriou
collection	DOAJ
description	Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become better in a range of fields of interest, audio analysis will have to play an integral part. Event recognition in autonomous vehicles (AVs) is such a field at a nascent stage that can especially leverage solely on audio or can be part of the multimodal approach. In this manuscript, an extensive analysis focused on the comparison of different magnitude representations of the raw audio is presented. The data on which the analysis is carried out is part of the publicly available MIVIA Audio Events dataset. Single channel Short-Time Fourier Transform (STFT), mel-scale and Mel-Frequency Cepstral Coefficients (MFCCs) spectrogram representations are used. Furthermore, aggregation methods of the aforementioned spectrogram representations are examined; the feature concatenation compared to the stacking of features as separate channels. The effect of the SNR on recognition accuracy and the generalization of the proposed methods on datasets that were both seen and not seen during training are studied and reported.
first_indexed	2024-03-10T15:58:28Z
format	Article
id	doaj.art-0bc4705665c64e85a3cb4b7ac2a83ca3
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-10T15:58:28Z
publishDate	2020-09-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-0bc4705665c64e85a3cb4b7ac2a83ca32023-11-20T15:28:29ZengMDPI AGElectronics2079-92922020-09-01910159310.3390/electronics9101593Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude RepresentationsIoannis Papadimitriou0Anastasios Vafeiadis1Antonios Lalas2Konstantinos Votis3Dimitrios Tzovaras4Center for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceAudio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become better in a range of fields of interest, audio analysis will have to play an integral part. Event recognition in autonomous vehicles (AVs) is such a field at a nascent stage that can especially leverage solely on audio or can be part of the multimodal approach. In this manuscript, an extensive analysis focused on the comparison of different magnitude representations of the raw audio is presented. The data on which the analysis is carried out is part of the publicly available MIVIA Audio Events dataset. Single channel Short-Time Fourier Transform (STFT), mel-scale and Mel-Frequency Cepstral Coefficients (MFCCs) spectrogram representations are used. Furthermore, aggregation methods of the aforementioned spectrogram representations are examined; the feature concatenation compared to the stacking of features as separate channels. The effect of the SNR on recognition accuracy and the generalization of the proposed methods on datasets that were both seen and not seen during training are studied and reported.https://www.mdpi.com/2079-9292/9/10/1593audio surveillancespectrogramsCNNSNRmultichannel
spellingShingle	Ioannis Papadimitriou Anastasios Vafeiadis Antonios Lalas Konstantinos Votis Dimitrios Tzovaras Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations Electronics audio surveillance spectrograms CNN SNR multichannel
title	Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
title_full	Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
title_fullStr	Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
title_full_unstemmed	Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
title_short	Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
title_sort	audio based event detection at different snr settings using two dimensional spectrogram magnitude representations
topic	audio surveillance spectrograms CNN SNR multichannel
url	https://www.mdpi.com/2079-9292/9/10/1593
work_keys_str_mv	AT ioannispapadimitriou audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations AT anastasiosvafeiadis audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations AT antonioslalas audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations AT konstantinosvotis audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations AT dimitriostzovaras audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations

Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations

Similar Items