Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations
Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become bet...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-09-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/9/10/1593 |
_version_ | 1797552261015011328 |
---|---|
author | Ioannis Papadimitriou Anastasios Vafeiadis Antonios Lalas Konstantinos Votis Dimitrios Tzovaras |
author_facet | Ioannis Papadimitriou Anastasios Vafeiadis Antonios Lalas Konstantinos Votis Dimitrios Tzovaras |
author_sort | Ioannis Papadimitriou |
collection | DOAJ |
description | Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become better in a range of fields of interest, audio analysis will have to play an integral part. Event recognition in autonomous vehicles (AVs) is such a field at a nascent stage that can especially leverage solely on audio or can be part of the multimodal approach. In this manuscript, an extensive analysis focused on the comparison of different magnitude representations of the raw audio is presented. The data on which the analysis is carried out is part of the publicly available MIVIA Audio Events dataset. Single channel Short-Time Fourier Transform (STFT), mel-scale and Mel-Frequency Cepstral Coefficients (MFCCs) spectrogram representations are used. Furthermore, aggregation methods of the aforementioned spectrogram representations are examined; the feature concatenation compared to the stacking of features as separate channels. The effect of the SNR on recognition accuracy and the generalization of the proposed methods on datasets that were both seen and not seen during training are studied and reported. |
first_indexed | 2024-03-10T15:58:28Z |
format | Article |
id | doaj.art-0bc4705665c64e85a3cb4b7ac2a83ca3 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T15:58:28Z |
publishDate | 2020-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-0bc4705665c64e85a3cb4b7ac2a83ca32023-11-20T15:28:29ZengMDPI AGElectronics2079-92922020-09-01910159310.3390/electronics9101593Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude RepresentationsIoannis Papadimitriou0Anastasios Vafeiadis1Antonios Lalas2Konstantinos Votis3Dimitrios Tzovaras4Center for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceCenter for Research and Technology Hellas-Information Technologies Institute, Thessaloniki 57001, GreeceAudio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become better in a range of fields of interest, audio analysis will have to play an integral part. Event recognition in autonomous vehicles (AVs) is such a field at a nascent stage that can especially leverage solely on audio or can be part of the multimodal approach. In this manuscript, an extensive analysis focused on the comparison of different magnitude representations of the raw audio is presented. The data on which the analysis is carried out is part of the publicly available MIVIA Audio Events dataset. Single channel Short-Time Fourier Transform (STFT), mel-scale and Mel-Frequency Cepstral Coefficients (MFCCs) spectrogram representations are used. Furthermore, aggregation methods of the aforementioned spectrogram representations are examined; the feature concatenation compared to the stacking of features as separate channels. The effect of the SNR on recognition accuracy and the generalization of the proposed methods on datasets that were both seen and not seen during training are studied and reported.https://www.mdpi.com/2079-9292/9/10/1593audio surveillancespectrogramsCNNSNRmultichannel |
spellingShingle | Ioannis Papadimitriou Anastasios Vafeiadis Antonios Lalas Konstantinos Votis Dimitrios Tzovaras Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations Electronics audio surveillance spectrograms CNN SNR multichannel |
title | Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations |
title_full | Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations |
title_fullStr | Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations |
title_full_unstemmed | Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations |
title_short | Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations |
title_sort | audio based event detection at different snr settings using two dimensional spectrogram magnitude representations |
topic | audio surveillance spectrograms CNN SNR multichannel |
url | https://www.mdpi.com/2079-9292/9/10/1593 |
work_keys_str_mv | AT ioannispapadimitriou audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations AT anastasiosvafeiadis audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations AT antonioslalas audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations AT konstantinosvotis audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations AT dimitriostzovaras audiobasedeventdetectionatdifferentsnrsettingsusingtwodimensionalspectrogrammagnituderepresentations |