Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition

Speech Emotion Detection (SER) is a field of identifying human emotions from human speech utterances. Human speech utterances are a combination of linguistic and non-linguistic information. Nonlinguistic SER provides a generalized solution in human–computer interaction applications as it overcomes...

Full description

Bibliographic Details
Main Authors:	Yalamanchili Bhanusree, Samayamantula Srinivas Kumar, Anne Koteswara Rao
Format:	Article
Language:	English
Published:	UUM Press 2023-01-01
Series:	Journal of ICT
Subjects:	Ensemble classifiers Random Forest Speech Emotion Recognition Human Computer Interaction time-distributed layers spatiotemporal features
Online Access:	https://e-journal.uum.edu.my/index.php/jict/article/view/14982

_version_	1797948322216935424
author	Yalamanchili Bhanusree Samayamantula Srinivas Kumar Anne Koteswara Rao
author_facet	Yalamanchili Bhanusree Samayamantula Srinivas Kumar Anne Koteswara Rao
author_sort	Yalamanchili Bhanusree
collection	DOAJ
description	Speech Emotion Detection (SER) is a field of identifying human emotions from human speech utterances. Human speech utterances are a combination of linguistic and non-linguistic information. Nonlinguistic SER provides a generalized solution in human–computer interaction applications as it overcomes the language barrier. Machine learning and deep learning techniques were previously proposed for classifying emotions using handpicked features. To achieve effective and generalized SER, feature extraction can be performed using deep neural networks and ensemble learning for classification. The proposed model employed a time-distributed attention-layered convolution neural network (TDACNN) for extracting spatiotemporal features at the first stage and a random forest (RF) classifier, which is an ensemble classifier for efficient and generalized classification of emotions, at the second stage. The proposed model was implemented on the RAVDESS and IEMOCAP data corpora and compared with the CNN-SVM and CNN-RF models for SER. The TDACNN-RF model exhibited test classification accuracies of 92.19 percent and 90.27 percent on the RAVDESS and IEMOCAP data corpora, respectively. The experimental results proved that the proposed model is efficient in extracting spatiotemporal features from time-series speech signals and can classify emotions with good accuracy. The class confusion among the emotions was reduced for both data corpora, proving that the model achieved generalization.
first_indexed	2024-04-10T21:41:32Z
format	Article
id	doaj.art-7fd2819fb6764ee28aa451adb4b53a13
institution	Directory Open Access Journal
issn	1675-414X 2180-3862
language	English
last_indexed	2024-04-10T21:41:32Z
publishDate	2023-01-01
publisher	UUM Press
record_format	Article
series	Journal of ICT
spelling	doaj.art-7fd2819fb6764ee28aa451adb4b53a132023-01-19T01:50:45ZengUUM PressJournal of ICT1675-414X2180-38622023-01-0122110.32890/jict2023.22.1.3Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion RecognitionYalamanchili Bhanusree0Samayamantula Srinivas Kumar1Anne Koteswara Rao2Department of Computer Science Engineering, Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology, IndiaDepartment of Electronics and Communications Engineering, Jawaharlal Nehru Technological University Kakinada, IndiaDepartment of Computer Science Engineering, Kalasalingam Academy of Research and Education, India Speech Emotion Detection (SER) is a field of identifying human emotions from human speech utterances. Human speech utterances are a combination of linguistic and non-linguistic information. Nonlinguistic SER provides a generalized solution in human–computer interaction applications as it overcomes the language barrier. Machine learning and deep learning techniques were previously proposed for classifying emotions using handpicked features. To achieve effective and generalized SER, feature extraction can be performed using deep neural networks and ensemble learning for classification. The proposed model employed a time-distributed attention-layered convolution neural network (TDACNN) for extracting spatiotemporal features at the first stage and a random forest (RF) classifier, which is an ensemble classifier for efficient and generalized classification of emotions, at the second stage. The proposed model was implemented on the RAVDESS and IEMOCAP data corpora and compared with the CNN-SVM and CNN-RF models for SER. The TDACNN-RF model exhibited test classification accuracies of 92.19 percent and 90.27 percent on the RAVDESS and IEMOCAP data corpora, respectively. The experimental results proved that the proposed model is efficient in extracting spatiotemporal features from time-series speech signals and can classify emotions with good accuracy. The class confusion among the emotions was reduced for both data corpora, proving that the model achieved generalization. https://e-journal.uum.edu.my/index.php/jict/article/view/14982Ensemble classifiersRandom ForestSpeech Emotion RecognitionHuman Computer Interactiontime-distributed layersspatiotemporal features
spellingShingle	Yalamanchili Bhanusree Samayamantula Srinivas Kumar Anne Koteswara Rao Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition Journal of ICT Ensemble classifiers Random Forest Speech Emotion Recognition Human Computer Interaction time-distributed layers spatiotemporal features
title	Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition
title_full	Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition
title_fullStr	Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition
title_full_unstemmed	Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition
title_short	Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition
title_sort	time distributed attention layered convolution neural network with ensemble learning using random forest classifier for speech emotion recognition
topic	Ensemble classifiers Random Forest Speech Emotion Recognition Human Computer Interaction time-distributed layers spatiotemporal features
url	https://e-journal.uum.edu.my/index.php/jict/article/view/14982
work_keys_str_mv	AT yalamanchilibhanusree timedistributedattentionlayeredconvolutionneuralnetworkwithensemblelearningusingrandomforestclassifierforspeechemotionrecognition AT samayamantulasrinivaskumar timedistributedattentionlayeredconvolutionneuralnetworkwithensemblelearningusingrandomforestclassifierforspeechemotionrecognition AT annekoteswararao timedistributedattentionlayeredconvolutionneuralnetworkwithensemblelearningusingrandomforestclassifierforspeechemotionrecognition

Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition

Similar Items