Environmental Sound Classification With Low-Complexity Convolutional Neural Network Empowered by Sparse Salient Region Pooling

Environmental Sound Classification (ESC) is an important field in a broad range of applications, such as smart cities, audio surveillance, and health care. Recently, Convolutional Neural Networks (CNNs) have taken the lead from traditional approaches and have produced promising results. However, the...

Full description

Bibliographic Details
Main Authors: Hamed Riazati Seresht, Karim Mohammadi
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10002350/
_version_ 1797959591616577536
author Hamed Riazati Seresht
Karim Mohammadi
author_facet Hamed Riazati Seresht
Karim Mohammadi
author_sort Hamed Riazati Seresht
collection DOAJ
description Environmental Sound Classification (ESC) is an important field in a broad range of applications, such as smart cities, audio surveillance, and health care. Recently, Convolutional Neural Networks (CNNs) have taken the lead from traditional approaches and have produced promising results. However, the achieved improvements are often accompanied by increasing depth, complexity, and size of the network, which prevents their usage in many practical applications. In this work, our goal is to empower a small-size low-complexity CNN model to achieve superior performance. To this end, we concentrate on the importance of global pooling technique, which is less investigated in ESC. In most previous works, models utilize global average pooling layer which does not consider regional saliency, and thus weakens the salient time-frequency regions contributions to the classification, and also to the training of convolutional kernels. We propose a novel global pooling method, called Sparse Salient Region Pooling (SSRP), which computes the channel descriptors using a sparse subset of features, and guides the model to effectively learn from the more salient time-frequency regions. Experimental results demonstrate that the proposed model with only 700K parameters yields accuracies of 86.7% on ESC-50 and 94.8% on ESC-10, which are comparable to that of the state-of-the-art methods. Compared to the baseline model, our model achieves absolute improvement of 21.8% in accuracy on ESC-50, with 98% smaller model size. Our visual analyses show that SSRP intensifies the responses of low-energy regions such that they contribute even more than high-energy regions to the classification of specific sound classes.
first_indexed 2024-04-11T00:34:52Z
format Article
id doaj.art-62a387fb9f24421cb81faccc963d183e
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T00:34:52Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-62a387fb9f24421cb81faccc963d183e2023-01-07T00:00:49ZengIEEEIEEE Access2169-35362023-01-011184986210.1109/ACCESS.2022.323280710002350Environmental Sound Classification With Low-Complexity Convolutional Neural Network Empowered by Sparse Salient Region PoolingHamed Riazati Seresht0https://orcid.org/0000-0002-3849-3061Karim Mohammadi1School of Electrical Engineering, Iran University of Science and Technology, Tehran, IranSchool of Electrical Engineering, Iran University of Science and Technology, Tehran, IranEnvironmental Sound Classification (ESC) is an important field in a broad range of applications, such as smart cities, audio surveillance, and health care. Recently, Convolutional Neural Networks (CNNs) have taken the lead from traditional approaches and have produced promising results. However, the achieved improvements are often accompanied by increasing depth, complexity, and size of the network, which prevents their usage in many practical applications. In this work, our goal is to empower a small-size low-complexity CNN model to achieve superior performance. To this end, we concentrate on the importance of global pooling technique, which is less investigated in ESC. In most previous works, models utilize global average pooling layer which does not consider regional saliency, and thus weakens the salient time-frequency regions contributions to the classification, and also to the training of convolutional kernels. We propose a novel global pooling method, called Sparse Salient Region Pooling (SSRP), which computes the channel descriptors using a sparse subset of features, and guides the model to effectively learn from the more salient time-frequency regions. Experimental results demonstrate that the proposed model with only 700K parameters yields accuracies of 86.7% on ESC-50 and 94.8% on ESC-10, which are comparable to that of the state-of-the-art methods. Compared to the baseline model, our model achieves absolute improvement of 21.8% in accuracy on ESC-50, with 98% smaller model size. Our visual analyses show that SSRP intensifies the responses of low-energy regions such that they contribute even more than high-energy regions to the classification of specific sound classes.https://ieeexplore.ieee.org/document/10002350/Convolutional neural networksenvironmental sound classificationglobal feature poolinglow complexityregional saliency
spellingShingle Hamed Riazati Seresht
Karim Mohammadi
Environmental Sound Classification With Low-Complexity Convolutional Neural Network Empowered by Sparse Salient Region Pooling
IEEE Access
Convolutional neural networks
environmental sound classification
global feature pooling
low complexity
regional saliency
title Environmental Sound Classification With Low-Complexity Convolutional Neural Network Empowered by Sparse Salient Region Pooling
title_full Environmental Sound Classification With Low-Complexity Convolutional Neural Network Empowered by Sparse Salient Region Pooling
title_fullStr Environmental Sound Classification With Low-Complexity Convolutional Neural Network Empowered by Sparse Salient Region Pooling
title_full_unstemmed Environmental Sound Classification With Low-Complexity Convolutional Neural Network Empowered by Sparse Salient Region Pooling
title_short Environmental Sound Classification With Low-Complexity Convolutional Neural Network Empowered by Sparse Salient Region Pooling
title_sort environmental sound classification with low complexity convolutional neural network empowered by sparse salient region pooling
topic Convolutional neural networks
environmental sound classification
global feature pooling
low complexity
regional saliency
url https://ieeexplore.ieee.org/document/10002350/
work_keys_str_mv AT hamedriazatiseresht environmentalsoundclassificationwithlowcomplexityconvolutionalneuralnetworkempoweredbysparsesalientregionpooling
AT karimmohammadi environmentalsoundclassificationwithlowcomplexityconvolutionalneuralnetworkempoweredbysparsesalientregionpooling