Speech emotion classification using attention based network and regularized feature selection

Abstract Speech emotion classification (SEC) has gained the utmost height and occupied a conspicuous position within the research community in recent times. Its vital role in Human–Computer Interaction (HCI) and affective computing cannot be overemphasized. Many primitive algorithmic solutions and d...

Full description

Bibliographic Details
Main Authors:	Samson Akinpelu, Serestina Viriri
Format:	Article
Language:	English
Published:	Nature Portfolio 2023-07-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-023-38868-2

_version_	1797769497308823552
author	Samson Akinpelu Serestina Viriri
author_facet	Samson Akinpelu Serestina Viriri
author_sort	Samson Akinpelu
collection	DOAJ
description	Abstract Speech emotion classification (SEC) has gained the utmost height and occupied a conspicuous position within the research community in recent times. Its vital role in Human–Computer Interaction (HCI) and affective computing cannot be overemphasized. Many primitive algorithmic solutions and deep neural network (DNN) models have been proposed for efficient recognition of emotion from speech however, the suitability of these methods to accurately classify emotion from speech with multi-lingual background and other factors that impede efficient classification of emotion is still demanding critical consideration. This study proposed an attention-based network with a pre-trained convolutional neural network and regularized neighbourhood component analysis (RNCA) feature selection techniques for improved classification of speech emotion. The attention model has proven to be successful in many sequence-based and time-series tasks. An extensive experiment was carried out using three major classifiers (SVM, MLP and Random Forest) on a publicly available TESS (Toronto English Speech Sentence) dataset. The result of our proposed model (Attention-based DCNN+RNCA+RF) achieved 97.8% classification accuracy and yielded a 3.27% improved performance, which outperforms state-of-the-art SEC approaches. Our model evaluation revealed the consistency of attention mechanism and feature selection with human behavioural patterns in classifying emotion from auditory speech.
first_indexed	2024-03-12T21:09:56Z
format	Article
id	doaj.art-4b29b0028f994e7dbb52c651c22ffbf3
institution	Directory Open Access Journal
issn	2045-2322
language	English
last_indexed	2024-03-12T21:09:56Z
publishDate	2023-07-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj.art-4b29b0028f994e7dbb52c651c22ffbf32023-07-30T11:13:56ZengNature PortfolioScientific Reports2045-23222023-07-0113111410.1038/s41598-023-38868-2Speech emotion classification using attention based network and regularized feature selectionSamson Akinpelu0Serestina Viriri1School of Mathematics, Statistics and Computer Science, University of KwaZulu-NatalSchool of Mathematics, Statistics and Computer Science, University of KwaZulu-NatalAbstract Speech emotion classification (SEC) has gained the utmost height and occupied a conspicuous position within the research community in recent times. Its vital role in Human–Computer Interaction (HCI) and affective computing cannot be overemphasized. Many primitive algorithmic solutions and deep neural network (DNN) models have been proposed for efficient recognition of emotion from speech however, the suitability of these methods to accurately classify emotion from speech with multi-lingual background and other factors that impede efficient classification of emotion is still demanding critical consideration. This study proposed an attention-based network with a pre-trained convolutional neural network and regularized neighbourhood component analysis (RNCA) feature selection techniques for improved classification of speech emotion. The attention model has proven to be successful in many sequence-based and time-series tasks. An extensive experiment was carried out using three major classifiers (SVM, MLP and Random Forest) on a publicly available TESS (Toronto English Speech Sentence) dataset. The result of our proposed model (Attention-based DCNN+RNCA+RF) achieved 97.8% classification accuracy and yielded a 3.27% improved performance, which outperforms state-of-the-art SEC approaches. Our model evaluation revealed the consistency of attention mechanism and feature selection with human behavioural patterns in classifying emotion from auditory speech.https://doi.org/10.1038/s41598-023-38868-2
spellingShingle	Samson Akinpelu Serestina Viriri Speech emotion classification using attention based network and regularized feature selection Scientific Reports
title	Speech emotion classification using attention based network and regularized feature selection
title_full	Speech emotion classification using attention based network and regularized feature selection
title_fullStr	Speech emotion classification using attention based network and regularized feature selection
title_full_unstemmed	Speech emotion classification using attention based network and regularized feature selection
title_short	Speech emotion classification using attention based network and regularized feature selection
title_sort	speech emotion classification using attention based network and regularized feature selection
url	https://doi.org/10.1038/s41598-023-38868-2
work_keys_str_mv	AT samsonakinpelu speechemotionclassificationusingattentionbasednetworkandregularizedfeatureselection AT serestinaviriri speechemotionclassificationusingattentionbasednetworkandregularizedfeatureselection

Speech emotion classification using attention based network and regularized feature selection

Similar Items