GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition

Emotion recognition plays an essential role in interpersonal communication. However, existing recognition systems use only features of a single modality for emotion recognition, ignoring the interaction of information from the different modalities. Therefore, in our study, we propose a global-aware...

Full description

Bibliographic Details
Main Authors: Feng Li, Jiusong Luo, Lingling Wang, Wei Liu, Xiaoshuang Sang
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-05-01
Series:Frontiers in Neuroscience
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fnins.2023.1183132/full
_version_ 1797833599616024576
author Feng Li
Feng Li
Jiusong Luo
Lingling Wang
Wei Liu
Xiaoshuang Sang
author_facet Feng Li
Feng Li
Jiusong Luo
Lingling Wang
Wei Liu
Xiaoshuang Sang
author_sort Feng Li
collection DOAJ
description Emotion recognition plays an essential role in interpersonal communication. However, existing recognition systems use only features of a single modality for emotion recognition, ignoring the interaction of information from the different modalities. Therefore, in our study, we propose a global-aware Cross-modal feature Fusion Network (GCF2-Net) for recognizing emotion. We construct a residual cross-modal fusion attention module (ResCMFA) to fuse information from multiple modalities and design a global-aware module to capture global details. More specifically, we first use transfer learning to extract wav2vec 2.0 features and text features fused by the ResCMFA module. Then, cross-modal fusion features are fed into the global-aware module to capture the most essential emotional information globally. Finally, the experiment results have shown that our proposed method has significant advantages than state-of-the-art methods on the IEMOCAP and MELD datasets, respectively.
first_indexed 2024-04-09T14:25:43Z
format Article
id doaj.art-120b7fd4acbe47e898dfd1e61b5289da
institution Directory Open Access Journal
issn 1662-453X
language English
last_indexed 2024-04-09T14:25:43Z
publishDate 2023-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Neuroscience
spelling doaj.art-120b7fd4acbe47e898dfd1e61b5289da2023-05-04T04:23:38ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2023-05-011710.3389/fnins.2023.11831321183132GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognitionFeng Li0Feng Li1Jiusong Luo2Lingling Wang3Wei Liu4Xiaoshuang Sang5Department of Computer Science and Technology, Anhui University of Finance and Economics, Anhui, ChinaSchool of Information Science and Technology, University of Science and Technology of China, Anhui, ChinaDepartment of Computer Science and Technology, Anhui University of Finance and Economics, Anhui, ChinaDepartment of Computer Science and Technology, Anhui University of Finance and Economics, Anhui, ChinaDepartment of Computer Science and Technology, Anhui University of Finance and Economics, Anhui, ChinaDepartment of Computer Science and Technology, Anhui University of Finance and Economics, Anhui, ChinaEmotion recognition plays an essential role in interpersonal communication. However, existing recognition systems use only features of a single modality for emotion recognition, ignoring the interaction of information from the different modalities. Therefore, in our study, we propose a global-aware Cross-modal feature Fusion Network (GCF2-Net) for recognizing emotion. We construct a residual cross-modal fusion attention module (ResCMFA) to fuse information from multiple modalities and design a global-aware module to capture global details. More specifically, we first use transfer learning to extract wav2vec 2.0 features and text features fused by the ResCMFA module. Then, cross-modal fusion features are fed into the global-aware module to capture the most essential emotional information globally. Finally, the experiment results have shown that our proposed method has significant advantages than state-of-the-art methods on the IEMOCAP and MELD datasets, respectively.https://www.frontiersin.org/articles/10.3389/fnins.2023.1183132/fullspeech emotion recognitionglobal-awarefeature fusion networkwav2vec 2.0cross-modal
spellingShingle Feng Li
Feng Li
Jiusong Luo
Lingling Wang
Wei Liu
Xiaoshuang Sang
GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition
Frontiers in Neuroscience
speech emotion recognition
global-aware
feature fusion network
wav2vec 2.0
cross-modal
title GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition
title_full GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition
title_fullStr GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition
title_full_unstemmed GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition
title_short GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition
title_sort gcf2 net global aware cross modal feature fusion network for speech emotion recognition
topic speech emotion recognition
global-aware
feature fusion network
wav2vec 2.0
cross-modal
url https://www.frontiersin.org/articles/10.3389/fnins.2023.1183132/full
work_keys_str_mv AT fengli gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition
AT fengli gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition
AT jiusongluo gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition
AT linglingwang gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition
AT weiliu gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition
AT xiaoshuangsang gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition