A New Network Structure for Speech Emotion Recognition Research

Deep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the prob...

Full description

Bibliographic Details
Main Authors: Chunsheng Xu, Yunqing Liu, Wenjun Song, Zonglin Liang, Xing Chen
Format: Article
Language:English
Published: MDPI AG 2024-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/5/1429
_version_ 1797263856476618752
author Chunsheng Xu
Yunqing Liu
Wenjun Song
Zonglin Liang
Xing Chen
author_facet Chunsheng Xu
Yunqing Liu
Wenjun Song
Zonglin Liang
Xing Chen
author_sort Chunsheng Xu
collection DOAJ
description Deep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the problem that the emotional information contained in the current speech signals is distributed dispersedly and cannot comprehensively integrate local and global information, this paper presents a network model based on a gated recurrent unit (GRU) and multi-head attention. We evaluate our proposed emotion model on the IEMOCAP and Emo-DB corpora. The experimental results show that the network model based on Bi-GRU and multi-head attention is significantly better than the traditional network model at detecting multiple evaluation indicators. At the same time, we also apply the model to a speech sentiment analysis task. On the CH-SIMS and MOSI datasets, the model shows excellent generalization performance.
first_indexed 2024-04-25T00:19:39Z
format Article
id doaj.art-0f7c9004efd640ea8c13d1c95c244607
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-04-25T00:19:39Z
publishDate 2024-02-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-0f7c9004efd640ea8c13d1c95c2446072024-03-12T16:54:43ZengMDPI AGSensors1424-82202024-02-01245142910.3390/s24051429A New Network Structure for Speech Emotion Recognition ResearchChunsheng Xu0Yunqing Liu1Wenjun Song2Zonglin Liang3Xing Chen4School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaDeep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the problem that the emotional information contained in the current speech signals is distributed dispersedly and cannot comprehensively integrate local and global information, this paper presents a network model based on a gated recurrent unit (GRU) and multi-head attention. We evaluate our proposed emotion model on the IEMOCAP and Emo-DB corpora. The experimental results show that the network model based on Bi-GRU and multi-head attention is significantly better than the traditional network model at detecting multiple evaluation indicators. At the same time, we also apply the model to a speech sentiment analysis task. On the CH-SIMS and MOSI datasets, the model shows excellent generalization performance.https://www.mdpi.com/1424-8220/24/5/1429speech emotion recognitionspectrogramsmulti-head attentionBi-GRU
spellingShingle Chunsheng Xu
Yunqing Liu
Wenjun Song
Zonglin Liang
Xing Chen
A New Network Structure for Speech Emotion Recognition Research
Sensors
speech emotion recognition
spectrograms
multi-head attention
Bi-GRU
title A New Network Structure for Speech Emotion Recognition Research
title_full A New Network Structure for Speech Emotion Recognition Research
title_fullStr A New Network Structure for Speech Emotion Recognition Research
title_full_unstemmed A New Network Structure for Speech Emotion Recognition Research
title_short A New Network Structure for Speech Emotion Recognition Research
title_sort new network structure for speech emotion recognition research
topic speech emotion recognition
spectrograms
multi-head attention
Bi-GRU
url https://www.mdpi.com/1424-8220/24/5/1429
work_keys_str_mv AT chunshengxu anewnetworkstructureforspeechemotionrecognitionresearch
AT yunqingliu anewnetworkstructureforspeechemotionrecognitionresearch
AT wenjunsong anewnetworkstructureforspeechemotionrecognitionresearch
AT zonglinliang anewnetworkstructureforspeechemotionrecognitionresearch
AT xingchen anewnetworkstructureforspeechemotionrecognitionresearch
AT chunshengxu newnetworkstructureforspeechemotionrecognitionresearch
AT yunqingliu newnetworkstructureforspeechemotionrecognitionresearch
AT wenjunsong newnetworkstructureforspeechemotionrecognitionresearch
AT zonglinliang newnetworkstructureforspeechemotionrecognitionresearch
AT xingchen newnetworkstructureforspeechemotionrecognitionresearch