Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks

Audio signal classification finds various applications in detecting and monitoring health conditions in healthcare. Convolutional neural networks (CNN) have produced state-of-the-art results in image classification and are being increasingly used in other tasks, including signal classification. Howe...

Full description

Bibliographic Details
Main Authors: Roneel V. Sharan, Hao Xiong, Shlomo Berkovsky
Format: Article
Language:English
Published: MDPI AG 2021-05-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/21/10/3434
_version_ 1797534065646108672
author Roneel V. Sharan
Hao Xiong
Shlomo Berkovsky
author_facet Roneel V. Sharan
Hao Xiong
Shlomo Berkovsky
author_sort Roneel V. Sharan
collection DOAJ
description Audio signal classification finds various applications in detecting and monitoring health conditions in healthcare. Convolutional neural networks (CNN) have produced state-of-the-art results in image classification and are being increasingly used in other tasks, including signal classification. However, audio signal classification using CNN presents various challenges. In image classification tasks, raw images of equal dimensions can be used as a direct input to CNN. Raw time-domain signals, on the other hand, can be of varying dimensions. In addition, the temporal signal often has to be transformed to frequency-domain to reveal unique spectral characteristics, therefore requiring signal transformation. In this work, we overview and benchmark various audio signal representation techniques for classification using CNN, including approaches that deal with signals of different lengths and combine multiple representations to improve the classification accuracy. Hence, this work surfaces important empirical evidence that may guide future works deploying CNN for audio signal classification purposes.
first_indexed 2024-03-10T11:24:21Z
format Article
id doaj.art-dc607b9d8c98434c904608fdc8007047
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T11:24:21Z
publishDate 2021-05-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-dc607b9d8c98434c904608fdc80070472023-11-21T19:48:27ZengMDPI AGSensors1424-82202021-05-012110343410.3390/s21103434Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural NetworksRoneel V. Sharan0Hao Xiong1Shlomo Berkovsky2Australian Institute of Health Innovation, Macquarie University, Sydney, NSW 2109, AustraliaAustralian Institute of Health Innovation, Macquarie University, Sydney, NSW 2109, AustraliaAustralian Institute of Health Innovation, Macquarie University, Sydney, NSW 2109, AustraliaAudio signal classification finds various applications in detecting and monitoring health conditions in healthcare. Convolutional neural networks (CNN) have produced state-of-the-art results in image classification and are being increasingly used in other tasks, including signal classification. However, audio signal classification using CNN presents various challenges. In image classification tasks, raw images of equal dimensions can be used as a direct input to CNN. Raw time-domain signals, on the other hand, can be of varying dimensions. In addition, the temporal signal often has to be transformed to frequency-domain to reveal unique spectral characteristics, therefore requiring signal transformation. In this work, we overview and benchmark various audio signal representation techniques for classification using CNN, including approaches that deal with signals of different lengths and combine multiple representations to improve the classification accuracy. Hence, this work surfaces important empirical evidence that may guide future works deploying CNN for audio signal classification purposes.https://www.mdpi.com/1424-8220/21/10/3434convolutional neural networksfusioninterpolationmachine learningspectrogramtime-frequency image
spellingShingle Roneel V. Sharan
Hao Xiong
Shlomo Berkovsky
Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks
Sensors
convolutional neural networks
fusion
interpolation
machine learning
spectrogram
time-frequency image
title Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks
title_full Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks
title_fullStr Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks
title_full_unstemmed Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks
title_short Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks
title_sort benchmarking audio signal representation techniques for classification with convolutional neural networks
topic convolutional neural networks
fusion
interpolation
machine learning
spectrogram
time-frequency image
url https://www.mdpi.com/1424-8220/21/10/3434
work_keys_str_mv AT roneelvsharan benchmarkingaudiosignalrepresentationtechniquesforclassificationwithconvolutionalneuralnetworks
AT haoxiong benchmarkingaudiosignalrepresentationtechniquesforclassificationwithconvolutionalneuralnetworks
AT shlomoberkovsky benchmarkingaudiosignalrepresentationtechniquesforclassificationwithconvolutionalneuralnetworks