Lightweight Deep Learning Framework for Speech Emotion Recognition

Speech Emotion Recognition (SER) system, which analyzes human utterances to determine a speaker’s emotion, has a growing impact on how people and machines interact. Recent growth in human-computer interaction and computational intelligence has drawn the attention of many researchers in Ar...

Full description

Bibliographic Details
Main Authors:	Samson Akinpelu, Serestina Viriri, Adekanmi Adegun
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Deep learning convolutional neural network speech emotion lightweight human–computer interaction
Online Access:	https://ieeexplore.ieee.org/document/10188865/

_version_	1827889271142350848
author	Samson Akinpelu Serestina Viriri Adekanmi Adegun
author_facet	Samson Akinpelu Serestina Viriri Adekanmi Adegun
author_sort	Samson Akinpelu
collection	DOAJ
description	Speech Emotion Recognition (SER) system, which analyzes human utterances to determine a speaker’s emotion, has a growing impact on how people and machines interact. Recent growth in human-computer interaction and computational intelligence has drawn the attention of many researchers in Artificial Intelligence (AI) to deep learning because of its wider applicability to several fields, including computer vision, natural language processing, and affective computing, among others. Deep learning models do not need any form of manually created features because they can automatically extract the prospective features from the input data. Deep learning models, however, call for a lot of resources, high processing power, and hyper-parameter tuning, making them unsuitable for lightweight devices. In this study, we focused on developing an efficient lightweight model for speech emotion recognition with optimized parameters without compromising performance. Our proposed model integrates Random Forest and Multi-layer Perceptron(MLP) classifiers into the VGGNet framework for efficient speech emotion recognition. The proposed model was evaluated against other deep learning based methods (InceptionV3, ResNet, MobileNetV2, DenseNet) and it yielded low computational complexity with optimum performance. The experiment was carried out on three datasets of TESS, EMODB, and RAVDESS, and Mel Frequency Cepstral Coefficient(MFCC) features were extracted with 6–8 variants of emotions namely, Sad, Angry, Happy, Surprise, Neutral, Disgust, Fear, and Calm. Our model demonstrated high performance of 100%, 96%, and 86.25% accuracy on TESS, EMODB, and RAVDESS datasets respectively. This revealed that the proposed lightweight model achieved higher accuracy of recognition compared to the recent state-of-the-art model found in the literature.
first_indexed	2024-03-12T20:53:41Z
format	Article
id	doaj.art-f57a8544a93944d781bb34858fb7c180
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-12T20:53:41Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-f57a8544a93944d781bb34858fb7c1802023-07-31T23:01:20ZengIEEEIEEE Access2169-35362023-01-0111770867709810.1109/ACCESS.2023.329726910188865Lightweight Deep Learning Framework for Speech Emotion RecognitionSamson Akinpelu0https://orcid.org/0000-0002-4636-215XSerestina Viriri1https://orcid.org/0000-0002-2850-8645Adekanmi Adegun2School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, South AfricaSchool of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, South AfricaSchool of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, South AfricaSpeech Emotion Recognition (SER) system, which analyzes human utterances to determine a speaker’s emotion, has a growing impact on how people and machines interact. Recent growth in human-computer interaction and computational intelligence has drawn the attention of many researchers in Artificial Intelligence (AI) to deep learning because of its wider applicability to several fields, including computer vision, natural language processing, and affective computing, among others. Deep learning models do not need any form of manually created features because they can automatically extract the prospective features from the input data. Deep learning models, however, call for a lot of resources, high processing power, and hyper-parameter tuning, making them unsuitable for lightweight devices. In this study, we focused on developing an efficient lightweight model for speech emotion recognition with optimized parameters without compromising performance. Our proposed model integrates Random Forest and Multi-layer Perceptron(MLP) classifiers into the VGGNet framework for efficient speech emotion recognition. The proposed model was evaluated against other deep learning based methods (InceptionV3, ResNet, MobileNetV2, DenseNet) and it yielded low computational complexity with optimum performance. The experiment was carried out on three datasets of TESS, EMODB, and RAVDESS, and Mel Frequency Cepstral Coefficient(MFCC) features were extracted with 6–8 variants of emotions namely, Sad, Angry, Happy, Surprise, Neutral, Disgust, Fear, and Calm. Our model demonstrated high performance of 100%, 96%, and 86.25% accuracy on TESS, EMODB, and RAVDESS datasets respectively. This revealed that the proposed lightweight model achieved higher accuracy of recognition compared to the recent state-of-the-art model found in the literature.https://ieeexplore.ieee.org/document/10188865/Deep learningconvolutional neural networkspeech emotionlightweighthuman–computer interaction
spellingShingle	Samson Akinpelu Serestina Viriri Adekanmi Adegun Lightweight Deep Learning Framework for Speech Emotion Recognition IEEE Access Deep learning convolutional neural network speech emotion lightweight human–computer interaction
title	Lightweight Deep Learning Framework for Speech Emotion Recognition
title_full	Lightweight Deep Learning Framework for Speech Emotion Recognition
title_fullStr	Lightweight Deep Learning Framework for Speech Emotion Recognition
title_full_unstemmed	Lightweight Deep Learning Framework for Speech Emotion Recognition
title_short	Lightweight Deep Learning Framework for Speech Emotion Recognition
title_sort	lightweight deep learning framework for speech emotion recognition
topic	Deep learning convolutional neural network speech emotion lightweight human–computer interaction
url	https://ieeexplore.ieee.org/document/10188865/
work_keys_str_mv	AT samsonakinpelu lightweightdeeplearningframeworkforspeechemotionrecognition AT serestinaviriri lightweightdeeplearningframeworkforspeechemotionrecognition AT adekanmiadegun lightweightdeeplearningframeworkforspeechemotionrecognition

Lightweight Deep Learning Framework for Speech Emotion Recognition

Similar Items