GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition
Emotion recognition plays an essential role in interpersonal communication. However, existing recognition systems use only features of a single modality for emotion recognition, ignoring the interaction of information from the different modalities. Therefore, in our study, we propose a global-aware...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2023-05-01
|
Series: | Frontiers in Neuroscience |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fnins.2023.1183132/full |
_version_ | 1797833599616024576 |
---|---|
author | Feng Li Feng Li Jiusong Luo Lingling Wang Wei Liu Xiaoshuang Sang |
author_facet | Feng Li Feng Li Jiusong Luo Lingling Wang Wei Liu Xiaoshuang Sang |
author_sort | Feng Li |
collection | DOAJ |
description | Emotion recognition plays an essential role in interpersonal communication. However, existing recognition systems use only features of a single modality for emotion recognition, ignoring the interaction of information from the different modalities. Therefore, in our study, we propose a global-aware Cross-modal feature Fusion Network (GCF2-Net) for recognizing emotion. We construct a residual cross-modal fusion attention module (ResCMFA) to fuse information from multiple modalities and design a global-aware module to capture global details. More specifically, we first use transfer learning to extract wav2vec 2.0 features and text features fused by the ResCMFA module. Then, cross-modal fusion features are fed into the global-aware module to capture the most essential emotional information globally. Finally, the experiment results have shown that our proposed method has significant advantages than state-of-the-art methods on the IEMOCAP and MELD datasets, respectively. |
first_indexed | 2024-04-09T14:25:43Z |
format | Article |
id | doaj.art-120b7fd4acbe47e898dfd1e61b5289da |
institution | Directory Open Access Journal |
issn | 1662-453X |
language | English |
last_indexed | 2024-04-09T14:25:43Z |
publishDate | 2023-05-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Neuroscience |
spelling | doaj.art-120b7fd4acbe47e898dfd1e61b5289da2023-05-04T04:23:38ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2023-05-011710.3389/fnins.2023.11831321183132GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognitionFeng Li0Feng Li1Jiusong Luo2Lingling Wang3Wei Liu4Xiaoshuang Sang5Department of Computer Science and Technology, Anhui University of Finance and Economics, Anhui, ChinaSchool of Information Science and Technology, University of Science and Technology of China, Anhui, ChinaDepartment of Computer Science and Technology, Anhui University of Finance and Economics, Anhui, ChinaDepartment of Computer Science and Technology, Anhui University of Finance and Economics, Anhui, ChinaDepartment of Computer Science and Technology, Anhui University of Finance and Economics, Anhui, ChinaDepartment of Computer Science and Technology, Anhui University of Finance and Economics, Anhui, ChinaEmotion recognition plays an essential role in interpersonal communication. However, existing recognition systems use only features of a single modality for emotion recognition, ignoring the interaction of information from the different modalities. Therefore, in our study, we propose a global-aware Cross-modal feature Fusion Network (GCF2-Net) for recognizing emotion. We construct a residual cross-modal fusion attention module (ResCMFA) to fuse information from multiple modalities and design a global-aware module to capture global details. More specifically, we first use transfer learning to extract wav2vec 2.0 features and text features fused by the ResCMFA module. Then, cross-modal fusion features are fed into the global-aware module to capture the most essential emotional information globally. Finally, the experiment results have shown that our proposed method has significant advantages than state-of-the-art methods on the IEMOCAP and MELD datasets, respectively.https://www.frontiersin.org/articles/10.3389/fnins.2023.1183132/fullspeech emotion recognitionglobal-awarefeature fusion networkwav2vec 2.0cross-modal |
spellingShingle | Feng Li Feng Li Jiusong Luo Lingling Wang Wei Liu Xiaoshuang Sang GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition Frontiers in Neuroscience speech emotion recognition global-aware feature fusion network wav2vec 2.0 cross-modal |
title | GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition |
title_full | GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition |
title_fullStr | GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition |
title_full_unstemmed | GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition |
title_short | GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition |
title_sort | gcf2 net global aware cross modal feature fusion network for speech emotion recognition |
topic | speech emotion recognition global-aware feature fusion network wav2vec 2.0 cross-modal |
url | https://www.frontiersin.org/articles/10.3389/fnins.2023.1183132/full |
work_keys_str_mv | AT fengli gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition AT fengli gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition AT jiusongluo gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition AT linglingwang gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition AT weiliu gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition AT xiaoshuangsang gcf2netglobalawarecrossmodalfeaturefusionnetworkforspeechemotionrecognition |