Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation

To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in...

Full description

Bibliographic Details
Main Authors: Hongliang Fu, Zhihao Zhuang, Yang Wang, Chen Huang, Wenzhuo Duan
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/25/1/124
_version_ 1797442957712818176
author Hongliang Fu
Zhihao Zhuang
Yang Wang
Chen Huang
Wenzhuo Duan
author_facet Hongliang Fu
Zhihao Zhuang
Yang Wang
Chen Huang
Wenzhuo Duan
author_sort Hongliang Fu
collection DOAJ
description To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in speech feature representation and cross-corpus feature distribution alignment. The proposed model uses a deep denoising auto-encoder as a shared feature extraction network for multi-task learning, and the fully connected layer and softmax layer are added before each recognition task as task-specific layers. Subsequently, the subdomain adaptation algorithm of emotion and gender features is added to the shared network to obtain the shared emotion features and gender features of the source domain and target domain, respectively. Multi-task learning effectively enhances the representation ability of features, a subdomain adaptive algorithm promotes the migrating ability of features and effectively alleviates the impact of feature distribution differences in emotional features. The average results of six cross-corpus speech emotion recognition experiments show that, compared with other models, the weighted average recall rate is increased by 1.89~10.07%, the experimental results verify the validity of the proposed model.
first_indexed 2024-03-09T12:49:10Z
format Article
id doaj.art-929d6faf4466408da9179571362a7025
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-09T12:49:10Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-929d6faf4466408da9179571362a70252023-11-30T22:09:00ZengMDPI AGEntropy1099-43002023-01-0125112410.3390/e25010124Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain AdaptationHongliang Fu0Zhihao Zhuang1Yang Wang2Chen Huang3Wenzhuo Duan4College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaCollege of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaCollege of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaCollege of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaCollege of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaTo solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in speech feature representation and cross-corpus feature distribution alignment. The proposed model uses a deep denoising auto-encoder as a shared feature extraction network for multi-task learning, and the fully connected layer and softmax layer are added before each recognition task as task-specific layers. Subsequently, the subdomain adaptation algorithm of emotion and gender features is added to the shared network to obtain the shared emotion features and gender features of the source domain and target domain, respectively. Multi-task learning effectively enhances the representation ability of features, a subdomain adaptive algorithm promotes the migrating ability of features and effectively alleviates the impact of feature distribution differences in emotional features. The average results of six cross-corpus speech emotion recognition experiments show that, compared with other models, the weighted average recall rate is increased by 1.89~10.07%, the experimental results verify the validity of the proposed model.https://www.mdpi.com/1099-4300/25/1/124speech emotion recognitionmulti-task learningsubdomain adaptationfeature distribution
spellingShingle Hongliang Fu
Zhihao Zhuang
Yang Wang
Chen Huang
Wenzhuo Duan
Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
Entropy
speech emotion recognition
multi-task learning
subdomain adaptation
feature distribution
title Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_full Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_fullStr Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_full_unstemmed Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_short Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_sort cross corpus speech emotion recognition based on multi task learning and subdomain adaptation
topic speech emotion recognition
multi-task learning
subdomain adaptation
feature distribution
url https://www.mdpi.com/1099-4300/25/1/124
work_keys_str_mv AT hongliangfu crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation
AT zhihaozhuang crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation
AT yangwang crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation
AT chenhuang crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation
AT wenzhuoduan crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation