A New Network Structure for Speech Emotion Recognition Research

Deep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the prob...

Full description

Bibliographic Details
Main Authors:	Chunsheng Xu, Yunqing Liu, Wenjun Song, Zonglin Liang, Xing Chen
Format:	Article
Language:	English
Published:	MDPI AG 2024-02-01
Series:	Sensors
Subjects:	speech emotion recognition spectrograms multi-head attention Bi-GRU
Online Access:	https://www.mdpi.com/1424-8220/24/5/1429

_version_	1797263856476618752
author	Chunsheng Xu Yunqing Liu Wenjun Song Zonglin Liang Xing Chen
author_facet	Chunsheng Xu Yunqing Liu Wenjun Song Zonglin Liang Xing Chen
author_sort	Chunsheng Xu
collection	DOAJ
description	Deep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the problem that the emotional information contained in the current speech signals is distributed dispersedly and cannot comprehensively integrate local and global information, this paper presents a network model based on a gated recurrent unit (GRU) and multi-head attention. We evaluate our proposed emotion model on the IEMOCAP and Emo-DB corpora. The experimental results show that the network model based on Bi-GRU and multi-head attention is significantly better than the traditional network model at detecting multiple evaluation indicators. At the same time, we also apply the model to a speech sentiment analysis task. On the CH-SIMS and MOSI datasets, the model shows excellent generalization performance.
first_indexed	2024-04-25T00:19:39Z
format	Article
id	doaj.art-0f7c9004efd640ea8c13d1c95c244607
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-04-25T00:19:39Z
publishDate	2024-02-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-0f7c9004efd640ea8c13d1c95c2446072024-03-12T16:54:43ZengMDPI AGSensors1424-82202024-02-01245142910.3390/s24051429A New Network Structure for Speech Emotion Recognition ResearchChunsheng Xu0Yunqing Liu1Wenjun Song2Zonglin Liang3Xing Chen4School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaDeep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the problem that the emotional information contained in the current speech signals is distributed dispersedly and cannot comprehensively integrate local and global information, this paper presents a network model based on a gated recurrent unit (GRU) and multi-head attention. We evaluate our proposed emotion model on the IEMOCAP and Emo-DB corpora. The experimental results show that the network model based on Bi-GRU and multi-head attention is significantly better than the traditional network model at detecting multiple evaluation indicators. At the same time, we also apply the model to a speech sentiment analysis task. On the CH-SIMS and MOSI datasets, the model shows excellent generalization performance.https://www.mdpi.com/1424-8220/24/5/1429speech emotion recognitionspectrogramsmulti-head attentionBi-GRU
spellingShingle	Chunsheng Xu Yunqing Liu Wenjun Song Zonglin Liang Xing Chen A New Network Structure for Speech Emotion Recognition Research Sensors speech emotion recognition spectrograms multi-head attention Bi-GRU
title	A New Network Structure for Speech Emotion Recognition Research
title_full	A New Network Structure for Speech Emotion Recognition Research
title_fullStr	A New Network Structure for Speech Emotion Recognition Research
title_full_unstemmed	A New Network Structure for Speech Emotion Recognition Research
title_short	A New Network Structure for Speech Emotion Recognition Research
title_sort	new network structure for speech emotion recognition research
topic	speech emotion recognition spectrograms multi-head attention Bi-GRU
url	https://www.mdpi.com/1424-8220/24/5/1429
work_keys_str_mv	AT chunshengxu anewnetworkstructureforspeechemotionrecognitionresearch AT yunqingliu anewnetworkstructureforspeechemotionrecognitionresearch AT wenjunsong anewnetworkstructureforspeechemotionrecognitionresearch AT zonglinliang anewnetworkstructureforspeechemotionrecognitionresearch AT xingchen anewnetworkstructureforspeechemotionrecognitionresearch AT chunshengxu newnetworkstructureforspeechemotionrecognitionresearch AT yunqingliu newnetworkstructureforspeechemotionrecognitionresearch AT wenjunsong newnetworkstructureforspeechemotionrecognitionresearch AT zonglinliang newnetworkstructureforspeechemotionrecognitionresearch AT xingchen newnetworkstructureforspeechemotionrecognitionresearch

A New Network Structure for Speech Emotion Recognition Research

Similar Items