Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network

Automatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature r...

Full description

Bibliographic Details
Main Authors:	Wei Jiang, Zheng Wang, Jesse S. Jin, Xianfeng Han, Chunguang Li
Format:	Article
Language:	English
Published:	MDPI AG 2019-06-01
Series:	Sensors
Subjects:	human–computer interaction (HCI) speech emotion recognition deep neural architecture heterogeneous feature unification fusion network
Online Access:	https://www.mdpi.com/1424-8220/19/12/2730

_version_	1811262569581641728
author	Wei Jiang Zheng Wang Jesse S. Jin Xianfeng Han Chunguang Li
author_facet	Wei Jiang Zheng Wang Jesse S. Jin Xianfeng Han Chunguang Li
author_sort	Wei Jiang
collection	DOAJ
description	Automatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature representations from the heterogeneous acoustic feature groups which may contain redundant and unrelated information leading to low emotion recognition performance in this work. After obtaining the informative features, a fusion network is trained to jointly learn the discriminative acoustic feature representation and a Support Vector Machine (SVM) is used as the final classifier for recognition task. Experimental results on the IEMOCAP dataset demonstrate that the proposed architecture improved the recognition performance, achieving accuracy of 64% compared to existing state-of-the-art approaches.
first_indexed	2024-04-12T19:27:54Z
format	Article
id	doaj.art-d11690a142a04eca80ee4f6f646b843a
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-04-12T19:27:54Z
publishDate	2019-06-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-d11690a142a04eca80ee4f6f646b843a2022-12-22T03:19:26ZengMDPI AGSensors1424-82202019-06-011912273010.3390/s19122730s19122730Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural NetworkWei Jiang0Zheng Wang1Jesse S. Jin2Xianfeng Han3Chunguang Li4College of Intelligence and Computing, Tianjin University, Tianjin 300072, ChinaCollege of Intelligence and Computing, Tianjin University, Tianjin 300072, ChinaCollege of Intelligence and Computing, Tianjin University, Tianjin 300072, ChinaCollege of Intelligence and Computing, Tianjin University, Tianjin 300072, ChinaSchool of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, ChinaAutomatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature representations from the heterogeneous acoustic feature groups which may contain redundant and unrelated information leading to low emotion recognition performance in this work. After obtaining the informative features, a fusion network is trained to jointly learn the discriminative acoustic feature representation and a Support Vector Machine (SVM) is used as the final classifier for recognition task. Experimental results on the IEMOCAP dataset demonstrate that the proposed architecture improved the recognition performance, achieving accuracy of 64% compared to existing state-of-the-art approaches.https://www.mdpi.com/1424-8220/19/12/2730human–computer interaction (HCI)speech emotion recognitiondeep neural architectureheterogeneous feature unificationfusion network
spellingShingle	Wei Jiang Zheng Wang Jesse S. Jin Xianfeng Han Chunguang Li Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network Sensors human–computer interaction (HCI) speech emotion recognition deep neural architecture heterogeneous feature unification fusion network
title	Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
title_full	Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
title_fullStr	Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
title_full_unstemmed	Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
title_short	Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
title_sort	speech emotion recognition with heterogeneous feature unification of deep neural network
topic	human–computer interaction (HCI) speech emotion recognition deep neural architecture heterogeneous feature unification fusion network
url	https://www.mdpi.com/1424-8220/19/12/2730
work_keys_str_mv	AT weijiang speechemotionrecognitionwithheterogeneousfeatureunificationofdeepneuralnetwork AT zhengwang speechemotionrecognitionwithheterogeneousfeatureunificationofdeepneuralnetwork AT jessesjin speechemotionrecognitionwithheterogeneousfeatureunificationofdeepneuralnetwork AT xianfenghan speechemotionrecognitionwithheterogeneousfeatureunificationofdeepneuralnetwork AT chunguangli speechemotionrecognitionwithheterogeneousfeatureunificationofdeepneuralnetwork

Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network

Similar Items