Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation

To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in...

Full description

Bibliographic Details
Main Authors:	Hongliang Fu, Zhihao Zhuang, Yang Wang, Chen Huang, Wenzhuo Duan
Format:	Article
Language:	English
Published:	MDPI AG 2023-01-01
Series:	Entropy
Subjects:	speech emotion recognition multi-task learning subdomain adaptation feature distribution
Online Access:	https://www.mdpi.com/1099-4300/25/1/124

_version_	1797442957712818176
author	Hongliang Fu Zhihao Zhuang Yang Wang Chen Huang Wenzhuo Duan
author_facet	Hongliang Fu Zhihao Zhuang Yang Wang Chen Huang Wenzhuo Duan
author_sort	Hongliang Fu
collection	DOAJ
description	To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in speech feature representation and cross-corpus feature distribution alignment. The proposed model uses a deep denoising auto-encoder as a shared feature extraction network for multi-task learning, and the fully connected layer and softmax layer are added before each recognition task as task-specific layers. Subsequently, the subdomain adaptation algorithm of emotion and gender features is added to the shared network to obtain the shared emotion features and gender features of the source domain and target domain, respectively. Multi-task learning effectively enhances the representation ability of features, a subdomain adaptive algorithm promotes the migrating ability of features and effectively alleviates the impact of feature distribution differences in emotional features. The average results of six cross-corpus speech emotion recognition experiments show that, compared with other models, the weighted average recall rate is increased by 1.89~10.07%, the experimental results verify the validity of the proposed model.
first_indexed	2024-03-09T12:49:10Z
format	Article
id	doaj.art-929d6faf4466408da9179571362a7025
institution	Directory Open Access Journal
issn	1099-4300
language	English
last_indexed	2024-03-09T12:49:10Z
publishDate	2023-01-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj.art-929d6faf4466408da9179571362a70252023-11-30T22:09:00ZengMDPI AGEntropy1099-43002023-01-0125112410.3390/e25010124Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain AdaptationHongliang Fu0Zhihao Zhuang1Yang Wang2Chen Huang3Wenzhuo Duan4College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaCollege of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaCollege of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaCollege of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaCollege of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaTo solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in speech feature representation and cross-corpus feature distribution alignment. The proposed model uses a deep denoising auto-encoder as a shared feature extraction network for multi-task learning, and the fully connected layer and softmax layer are added before each recognition task as task-specific layers. Subsequently, the subdomain adaptation algorithm of emotion and gender features is added to the shared network to obtain the shared emotion features and gender features of the source domain and target domain, respectively. Multi-task learning effectively enhances the representation ability of features, a subdomain adaptive algorithm promotes the migrating ability of features and effectively alleviates the impact of feature distribution differences in emotional features. The average results of six cross-corpus speech emotion recognition experiments show that, compared with other models, the weighted average recall rate is increased by 1.89~10.07%, the experimental results verify the validity of the proposed model.https://www.mdpi.com/1099-4300/25/1/124speech emotion recognitionmulti-task learningsubdomain adaptationfeature distribution
spellingShingle	Hongliang Fu Zhihao Zhuang Yang Wang Chen Huang Wenzhuo Duan Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation Entropy speech emotion recognition multi-task learning subdomain adaptation feature distribution
title	Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_full	Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_fullStr	Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_full_unstemmed	Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_short	Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
title_sort	cross corpus speech emotion recognition based on multi task learning and subdomain adaptation
topic	speech emotion recognition multi-task learning subdomain adaptation feature distribution
url	https://www.mdpi.com/1099-4300/25/1/124
work_keys_str_mv	AT hongliangfu crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation AT zhihaozhuang crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation AT yangwang crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation AT chenhuang crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation AT wenzhuoduan crosscorpusspeechemotionrecognitionbasedonmultitasklearningandsubdomainadaptation

Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation

Similar Items