Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations

Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become bet...

Full description

Bibliographic Details
Main Authors: Ioannis Papadimitriou, Anastasios Vafeiadis, Antonios Lalas, Konstantinos Votis, Dimitrios Tzovaras
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/9/10/1593
_version_ 1797552261015011328
author Ioannis Papadimitriou
Anastasios Vafeiadis
Antonios Lalas
Konstantinos Votis
Dimitrios Tzovaras
author_facet Ioannis Papadimitriou
Anastasios Vafeiadis
Antonios Lalas
Konstantinos Votis
Dimitrios Tzovaras
author_sort Ioannis Papadimitriou
collection DOAJ
description Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become better in a range of fields of interest, audio analysis will have to play an integral part. Event recognition in autonomous vehicles (AVs) is such a field at a nascent stage that can especially leverage solely on audio or can be part of the multimodal approach. In this manuscript, an extensive analysis focused on the comparison of different magnitude representations of the raw audio is presented. The data on which the analysis is carried out is part of the publicly available MIVIA Audio Events dataset. Single channel Short-Time Fourier Transform (STFT), mel-scale and Mel-Frequency Cepstral Coefficients (MFCCs) spectrogram representations are used. Furthermore, aggregation methods of the aforementioned spectrogram representations are examined; the feature concatenation compared to the stacking of features as separate channels. The effect of the SNR on recognition accuracy and the generalization of the proposed methods on datasets that were both seen and not seen during training are studied and reported.
first_indexed 2024-03-10T15:58:28Z
format Article
id doaj.art-0bc4705665c64e85a3cb4b7ac2a83ca3
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T15:58:28Z
publishDate 2020-09-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-0bc4705665c64e85a3cb4b7ac2a83ca32023-11-20T15:28:29ZengMDPI AGElectronics2079-92922020-09-01910159310.3390/electronics9101593Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude RepresentationsIoannis Papadimitriou0Anastasios Vafeiadis1Antonios Lalas2Konstantinos Votis3Dimitrios Tzovaras4Center for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceAudio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become better in a range of fields of interest, audio analysis will have to play an integral part. Event recognition in autonomous vehicles (AVs) is such a field at a nascent stage that can especially leverage solely on audio or can be part of the multimodal approach. In this manuscript, an extensive analysis focused on the comparison of different magnitude representations of the raw audio is presented. The data on which the analysis is carried out is part of the publicly available MIVIA Audio Events dataset. Single channel Short-Time Fourier Transform (STFT), mel-scale and Mel-Frequency Cepstral Coefficients (MFCCs) spectrogram representations are used. Furthermore, aggregation methods of the aforementioned spectrogram representations are examined; the feature concatenation compared to the stacking of features as separate channels. The effect of the SNR on recognition accuracy and the generalization of the proposed methods on datasets that were both seen and not seen during training are studied and reported.https://www.mdpi.com/2079-9292/9/10/1593audio surveillancespectrogramsCNNSNRmultichannel
spellingShingle Ioannis Papadimitriou
Anastasios Vafeiadis
Antonios Lalas
Konstantinos Votis
Dimitrios Tzovaras
Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
Electronics
audio surveillance
spectrograms
CNN
SNR
multichannel
title Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
title_full Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
title_fullStr Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
title_full_unstemmed Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
title_short Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
title_sort audio based event detection at different snr settings using two dimensional spectrogram magnitude representations
topic audio surveillance
spectrograms
CNN
SNR
multichannel
url https://www.mdpi.com/2079-9292/9/10/1593
work_keys_str_mv AT ioannispapadimitriou audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations
AT anastasiosvafeiadis audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations
AT antonioslalas audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations
AT konstantinosvotis audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations
AT dimitriostzovaras audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations