Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning

Speech Emotion Classification (SEC) relies heavily on the quality of feature extraction and selection from the speech signal. Improvement on this to enhance the classification of emotion had attracted significant attention from researchers. Many primitives and algorithmic solutions for efficient SEC...

Full description

Bibliographic Details
Main Authors: Samson Akinpelu, Serestina Viriri
Format: Article
Language:English
Published: MDPI AG 2022-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/16/8265
_version_ 1827618206440751104
author Samson Akinpelu
Serestina Viriri
author_facet Samson Akinpelu
Serestina Viriri
author_sort Samson Akinpelu
collection DOAJ
description Speech Emotion Classification (SEC) relies heavily on the quality of feature extraction and selection from the speech signal. Improvement on this to enhance the classification of emotion had attracted significant attention from researchers. Many primitives and algorithmic solutions for efficient SEC with minimum cost have been proposed; however, the accuracy and performance of these methods have not yet attained a satisfactory point. In this work, we proposed a novel deep transfer learning approach with distinctive emotional rich feature selection techniques for speech emotion classification. We adopt mel-spectrogram extracted from speech signal as the input to our deep convolutional neural network for efficient feature extraction. We froze 19 layers of our pretrained convolutional neural network from re-training to increase efficiency and minimize computational cost. One flattened layer and two dense layers were used. A ReLu activation function was used at the last layer of our feature extraction segment. To prevent misclassification and reduce feature dimensionality, we employed the Neighborhood Component Analysis (NCA) feature selection algorithm for picking out the most relevant features before the actual classification of emotion. Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) classifiers were utilized at the topmost layer of our model. Two popular datasets for speech emotion classification tasks were used, which are: Berling Emotional Speech Database (EMO-DB), and Toronto English Speech Set (TESS), and a combination of EMO-DB with TESS was used in our experiment. We obtained a state-of-the-art result with an accuracy rate of 94.3%, 100% specificity on EMO-DB, and 97.2%, 99.80% on TESS datasets, respectively. The performance of our proposed method outperformed some recent work in SEC after assessment on the three datasets.
first_indexed 2024-03-09T10:01:27Z
format Article
id doaj.art-576eee7ba32348c9aaae3da7ca75542a
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T10:01:27Z
publishDate 2022-08-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-576eee7ba32348c9aaae3da7ca75542a2023-12-01T23:22:13ZengMDPI AGApplied Sciences2076-34172022-08-011216826510.3390/app12168265Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer LearningSamson Akinpelu0Serestina Viriri1School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban 4041, South AfricaSchool of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban 4041, South AfricaSpeech Emotion Classification (SEC) relies heavily on the quality of feature extraction and selection from the speech signal. Improvement on this to enhance the classification of emotion had attracted significant attention from researchers. Many primitives and algorithmic solutions for efficient SEC with minimum cost have been proposed; however, the accuracy and performance of these methods have not yet attained a satisfactory point. In this work, we proposed a novel deep transfer learning approach with distinctive emotional rich feature selection techniques for speech emotion classification. We adopt mel-spectrogram extracted from speech signal as the input to our deep convolutional neural network for efficient feature extraction. We froze 19 layers of our pretrained convolutional neural network from re-training to increase efficiency and minimize computational cost. One flattened layer and two dense layers were used. A ReLu activation function was used at the last layer of our feature extraction segment. To prevent misclassification and reduce feature dimensionality, we employed the Neighborhood Component Analysis (NCA) feature selection algorithm for picking out the most relevant features before the actual classification of emotion. Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) classifiers were utilized at the topmost layer of our model. Two popular datasets for speech emotion classification tasks were used, which are: Berling Emotional Speech Database (EMO-DB), and Toronto English Speech Set (TESS), and a combination of EMO-DB with TESS was used in our experiment. We obtained a state-of-the-art result with an accuracy rate of 94.3%, 100% specificity on EMO-DB, and 97.2%, 99.80% on TESS datasets, respectively. The performance of our proposed method outperformed some recent work in SEC after assessment on the three datasets.https://www.mdpi.com/2076-3417/12/16/8265feature selectionspeech emotionclassificationdeep convolutional neural networktransfer learning
spellingShingle Samson Akinpelu
Serestina Viriri
Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning
Applied Sciences
feature selection
speech emotion
classification
deep convolutional neural network
transfer learning
title Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning
title_full Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning
title_fullStr Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning
title_full_unstemmed Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning
title_short Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning
title_sort robust feature selection based speech emotion classification using deep transfer learning
topic feature selection
speech emotion
classification
deep convolutional neural network
transfer learning
url https://www.mdpi.com/2076-3417/12/16/8265
work_keys_str_mv AT samsonakinpelu robustfeatureselectionbasedspeechemotionclassificationusingdeeptransferlearning
AT serestinaviriri robustfeatureselectionbasedspeechemotionclassificationusingdeeptransferlearning