Cross-corpus speech emotion recognition using subspace learning and domain adaption

Abstract Speech emotion recognition (SER) is a hot topic in speech signal processing. When the training data and the test data come from different corpus, their feature distributions are different, which leads to the degradation of the recognition performance. Therefore, in order to solve this probl...

Full description

Bibliographic Details
Main Authors: Xuan Cao, Maoshen Jia, Jiawei Ru, Tun-wen Pai
Format: Article
Language:English
Published: SpringerOpen 2022-12-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:https://doi.org/10.1186/s13636-022-00264-5
_version_ 1828083285071233024
author Xuan Cao
Maoshen Jia
Jiawei Ru
Tun-wen Pai
author_facet Xuan Cao
Maoshen Jia
Jiawei Ru
Tun-wen Pai
author_sort Xuan Cao
collection DOAJ
description Abstract Speech emotion recognition (SER) is a hot topic in speech signal processing. When the training data and the test data come from different corpus, their feature distributions are different, which leads to the degradation of the recognition performance. Therefore, in order to solve this problem, a cross-corpus speech emotion recognition method is proposed based on subspace learning and domain adaptation in this paper. Specifically, training set data and the test set data are used to form the source domain and target domain, respectively. Then, the Hessian matrix is introduced to obtain the subspace for the extracted features in both source and target domains. In addition, an information entropy-based domain adaption method is introduced to construct the common space. In the common space, the difference between the feature distributions in the source domain and target domain is reduced as much as possible. To evaluate the performance of the proposed method, extensive experiments are conducted on cross-corpus speech emotion recognition. Experimental results show that the proposed method achieves better performance compared with some existing subspace learning and domain adaptation methods.
first_indexed 2024-04-11T04:06:37Z
format Article
id doaj.art-b68453e7c64b4d159f1c2f38386f9d7d
institution Directory Open Access Journal
issn 1687-4722
language English
last_indexed 2024-04-11T04:06:37Z
publishDate 2022-12-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Audio, Speech, and Music Processing
spelling doaj.art-b68453e7c64b4d159f1c2f38386f9d7d2023-01-01T12:24:07ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222022-12-012022112010.1186/s13636-022-00264-5Cross-corpus speech emotion recognition using subspace learning and domain adaptionXuan Cao0Maoshen Jia1Jiawei Ru2Tun-wen Pai3Faculty of Information Technology, Beijing University of TechnologyFaculty of Information Technology, Beijing University of TechnologyFaculty of Information Technology, Beijing University of TechnologyDepartment of Computer Science and Information Engineering, National Taipei University of TechnologyAbstract Speech emotion recognition (SER) is a hot topic in speech signal processing. When the training data and the test data come from different corpus, their feature distributions are different, which leads to the degradation of the recognition performance. Therefore, in order to solve this problem, a cross-corpus speech emotion recognition method is proposed based on subspace learning and domain adaptation in this paper. Specifically, training set data and the test set data are used to form the source domain and target domain, respectively. Then, the Hessian matrix is introduced to obtain the subspace for the extracted features in both source and target domains. In addition, an information entropy-based domain adaption method is introduced to construct the common space. In the common space, the difference between the feature distributions in the source domain and target domain is reduced as much as possible. To evaluate the performance of the proposed method, extensive experiments are conducted on cross-corpus speech emotion recognition. Experimental results show that the proposed method achieves better performance compared with some existing subspace learning and domain adaptation methods.https://doi.org/10.1186/s13636-022-00264-5Speech emotion recognitionCross-corpusSubspace learningDomain adaption
spellingShingle Xuan Cao
Maoshen Jia
Jiawei Ru
Tun-wen Pai
Cross-corpus speech emotion recognition using subspace learning and domain adaption
EURASIP Journal on Audio, Speech, and Music Processing
Speech emotion recognition
Cross-corpus
Subspace learning
Domain adaption
title Cross-corpus speech emotion recognition using subspace learning and domain adaption
title_full Cross-corpus speech emotion recognition using subspace learning and domain adaption
title_fullStr Cross-corpus speech emotion recognition using subspace learning and domain adaption
title_full_unstemmed Cross-corpus speech emotion recognition using subspace learning and domain adaption
title_short Cross-corpus speech emotion recognition using subspace learning and domain adaption
title_sort cross corpus speech emotion recognition using subspace learning and domain adaption
topic Speech emotion recognition
Cross-corpus
Subspace learning
Domain adaption
url https://doi.org/10.1186/s13636-022-00264-5
work_keys_str_mv AT xuancao crosscorpusspeechemotionrecognitionusingsubspacelearninganddomainadaption
AT maoshenjia crosscorpusspeechemotionrecognitionusingsubspacelearninganddomainadaption
AT jiaweiru crosscorpusspeechemotionrecognitionusingsubspacelearninganddomainadaption
AT tunwenpai crosscorpusspeechemotionrecognitionusingsubspacelearninganddomainadaption