A New Network Structure for Speech Emotion Recognition Research
Deep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the prob...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-02-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/24/5/1429 |
_version_ | 1797263856476618752 |
---|---|
author | Chunsheng Xu Yunqing Liu Wenjun Song Zonglin Liang Xing Chen |
author_facet | Chunsheng Xu Yunqing Liu Wenjun Song Zonglin Liang Xing Chen |
author_sort | Chunsheng Xu |
collection | DOAJ |
description | Deep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the problem that the emotional information contained in the current speech signals is distributed dispersedly and cannot comprehensively integrate local and global information, this paper presents a network model based on a gated recurrent unit (GRU) and multi-head attention. We evaluate our proposed emotion model on the IEMOCAP and Emo-DB corpora. The experimental results show that the network model based on Bi-GRU and multi-head attention is significantly better than the traditional network model at detecting multiple evaluation indicators. At the same time, we also apply the model to a speech sentiment analysis task. On the CH-SIMS and MOSI datasets, the model shows excellent generalization performance. |
first_indexed | 2024-04-25T00:19:39Z |
format | Article |
id | doaj.art-0f7c9004efd640ea8c13d1c95c244607 |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-04-25T00:19:39Z |
publishDate | 2024-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-0f7c9004efd640ea8c13d1c95c2446072024-03-12T16:54:43ZengMDPI AGSensors1424-82202024-02-01245142910.3390/s24051429A New Network Structure for Speech Emotion Recognition ResearchChunsheng Xu0Yunqing Liu1Wenjun Song2Zonglin Liang3Xing Chen4School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaSchool of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaDeep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the problem that the emotional information contained in the current speech signals is distributed dispersedly and cannot comprehensively integrate local and global information, this paper presents a network model based on a gated recurrent unit (GRU) and multi-head attention. We evaluate our proposed emotion model on the IEMOCAP and Emo-DB corpora. The experimental results show that the network model based on Bi-GRU and multi-head attention is significantly better than the traditional network model at detecting multiple evaluation indicators. At the same time, we also apply the model to a speech sentiment analysis task. On the CH-SIMS and MOSI datasets, the model shows excellent generalization performance.https://www.mdpi.com/1424-8220/24/5/1429speech emotion recognitionspectrogramsmulti-head attentionBi-GRU |
spellingShingle | Chunsheng Xu Yunqing Liu Wenjun Song Zonglin Liang Xing Chen A New Network Structure for Speech Emotion Recognition Research Sensors speech emotion recognition spectrograms multi-head attention Bi-GRU |
title | A New Network Structure for Speech Emotion Recognition Research |
title_full | A New Network Structure for Speech Emotion Recognition Research |
title_fullStr | A New Network Structure for Speech Emotion Recognition Research |
title_full_unstemmed | A New Network Structure for Speech Emotion Recognition Research |
title_short | A New Network Structure for Speech Emotion Recognition Research |
title_sort | new network structure for speech emotion recognition research |
topic | speech emotion recognition spectrograms multi-head attention Bi-GRU |
url | https://www.mdpi.com/1424-8220/24/5/1429 |
work_keys_str_mv | AT chunshengxu anewnetworkstructureforspeechemotionrecognitionresearch AT yunqingliu anewnetworkstructureforspeechemotionrecognitionresearch AT wenjunsong anewnetworkstructureforspeechemotionrecognitionresearch AT zonglinliang anewnetworkstructureforspeechemotionrecognitionresearch AT xingchen anewnetworkstructureforspeechemotionrecognitionresearch AT chunshengxu newnetworkstructureforspeechemotionrecognitionresearch AT yunqingliu newnetworkstructureforspeechemotionrecognitionresearch AT wenjunsong newnetworkstructureforspeechemotionrecognitionresearch AT zonglinliang newnetworkstructureforspeechemotionrecognitionresearch AT xingchen newnetworkstructureforspeechemotionrecognitionresearch |