A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering

Speech Emotion Recognition (SER) is a common aspect of human-computer interaction and has significant applications in fields such as healthcare, education, and elder care. Although researchers have made progress in speech emotion feature extraction and model identification, they have struggled to cr...

Повний опис

Бібліографічні деталі
Автори:	Li-Min Zhang, Giap Weng Ng, Yu-Beng Leau, Hao Yan
Формат:	Стаття
Мова:	English English
Опубліковано:	IEEE 2023
Предмети:	QA150-272.5 Algebra TA630-695 Structural engineering (General)
Онлайн доступ:	https://eprints.ums.edu.my/id/eprint/38207/1/ABSTRACT.pdf https://eprints.ums.edu.my/id/eprint/38207/2/FULL%20TEXT.pdf

_version_	1825715495214514176
author	Li-Min Zhang Giap Weng Ng Yu-Beng Leau Hao Yan
author_facet	Li-Min Zhang Giap Weng Ng Yu-Beng Leau Hao Yan
author_sort	Li-Min Zhang
collection	UMS
description	Speech Emotion Recognition (SER) is a common aspect of human-computer interaction and has significant applications in fields such as healthcare, education, and elder care. Although researchers have made progress in speech emotion feature extraction and model identification, they have struggled to create an SER system with satisfactory recognition accuracy. To address this issue, we proposed a novel algorithm called F-Emotion to select speech emotion features and established a parallel deep learning model to recognize different types of emotions. We first extracted the emotion features from speech and calculated the F-Emotion value for each feature. These values were then used to determine the combination of speech emotion features that was optimal for speech emotion recognition. Next, a parallel deep learning model was established with the speech emotion feature combination as input to train and test for each type of emotion. Finally, decision fusion was applied to the parallel output results to obtain an overall recognition result. These analyses were conducted on two datasets, RAVDESS and EMO-DB, with the accuracy of speech emotion recognition reaching 82.3% and 88.8%, respectively. The results demonstrate that the F-Emotion algorithm can effectively analyze the correspondence between speech emotion features and emotion types. The MFCC feature best describes emotions of neutrality, happiness, fear, and surprise, and Mel best describes emotions of anger and sadness. The parallel deep learning model mechanism can improve the accuracy of speech emotion recognition.
first_indexed	2024-03-06T03:27:41Z
format	Article
id	ums.eprints-38207
institution	Universiti Malaysia Sabah
language	English English
last_indexed	2024-03-06T03:27:41Z
publishDate	2023
publisher	IEEE
record_format	dspace
spelling	ums.eprints-382072024-02-09T03:15:17Z https://eprints.ums.edu.my/id/eprint/38207/ A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering Li-Min Zhang Giap Weng Ng Yu-Beng Leau Hao Yan QA150-272.5 Algebra TA630-695 Structural engineering (General) Speech Emotion Recognition (SER) is a common aspect of human-computer interaction and has significant applications in fields such as healthcare, education, and elder care. Although researchers have made progress in speech emotion feature extraction and model identification, they have struggled to create an SER system with satisfactory recognition accuracy. To address this issue, we proposed a novel algorithm called F-Emotion to select speech emotion features and established a parallel deep learning model to recognize different types of emotions. We first extracted the emotion features from speech and calculated the F-Emotion value for each feature. These values were then used to determine the combination of speech emotion features that was optimal for speech emotion recognition. Next, a parallel deep learning model was established with the speech emotion feature combination as input to train and test for each type of emotion. Finally, decision fusion was applied to the parallel output results to obtain an overall recognition result. These analyses were conducted on two datasets, RAVDESS and EMO-DB, with the accuracy of speech emotion recognition reaching 82.3% and 88.8%, respectively. The results demonstrate that the F-Emotion algorithm can effectively analyze the correspondence between speech emotion features and emotion types. The MFCC feature best describes emotions of neutrality, happiness, fear, and surprise, and Mel best describes emotions of anger and sadness. The parallel deep learning model mechanism can improve the accuracy of speech emotion recognition. IEEE 2023 Article NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/38207/1/ABSTRACT.pdf text en https://eprints.ums.edu.my/id/eprint/38207/2/FULL%20TEXT.pdf Li-Min Zhang and Giap Weng Ng and Yu-Beng Leau and Hao Yan (2023) A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering. IEEE Access, 11. pp. 1-11. ISSN 2169-3536 https://doi.org/10.1109/ACCESS.2023.3294274
spellingShingle	QA150-272.5 Algebra TA630-695 Structural engineering (General) Li-Min Zhang Giap Weng Ng Yu-Beng Leau Hao Yan A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering
title	A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering
title_full	A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering
title_fullStr	A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering
title_full_unstemmed	A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering
title_short	A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering
title_sort	parallel model speech emotion recognition network based on feature clustering
topic	QA150-272.5 Algebra TA630-695 Structural engineering (General)
url	https://eprints.ums.edu.my/id/eprint/38207/1/ABSTRACT.pdf https://eprints.ums.edu.my/id/eprint/38207/2/FULL%20TEXT.pdf
work_keys_str_mv	AT liminzhang aparallelmodelspeechemotionrecognitionnetworkbasedonfeatureclustering AT giapwengng aparallelmodelspeechemotionrecognitionnetworkbasedonfeatureclustering AT yubengleau aparallelmodelspeechemotionrecognitionnetworkbasedonfeatureclustering AT haoyan aparallelmodelspeechemotionrecognitionnetworkbasedonfeatureclustering AT liminzhang parallelmodelspeechemotionrecognitionnetworkbasedonfeatureclustering AT giapwengng parallelmodelspeechemotionrecognitionnetworkbasedonfeatureclustering AT yubengleau parallelmodelspeechemotionrecognitionnetworkbasedonfeatureclustering AT haoyan parallelmodelspeechemotionrecognitionnetworkbasedonfeatureclustering

A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering

Схожі ресурси