Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network

Automatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature r...

Full description

Bibliographic Details
Main Authors: Wei Jiang, Zheng Wang, Jesse S. Jin, Xianfeng Han, Chunguang Li
Format: Article
Language:English
Published: MDPI AG 2019-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/19/12/2730
_version_ 1811262569581641728
author Wei Jiang
Zheng Wang
Jesse S. Jin
Xianfeng Han
Chunguang Li
author_facet Wei Jiang
Zheng Wang
Jesse S. Jin
Xianfeng Han
Chunguang Li
author_sort Wei Jiang
collection DOAJ
description Automatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature representations from the heterogeneous acoustic feature groups which may contain redundant and unrelated information leading to low emotion recognition performance in this work. After obtaining the informative features, a fusion network is trained to jointly learn the discriminative acoustic feature representation and a Support Vector Machine (SVM) is used as the final classifier for recognition task. Experimental results on the IEMOCAP dataset demonstrate that the proposed architecture improved the recognition performance, achieving accuracy of 64% compared to existing state-of-the-art approaches.
first_indexed 2024-04-12T19:27:54Z
format Article
id doaj.art-d11690a142a04eca80ee4f6f646b843a
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-04-12T19:27:54Z
publishDate 2019-06-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-d11690a142a04eca80ee4f6f646b843a2022-12-22T03:19:26ZengMDPI AGSensors1424-82202019-06-011912273010.3390/s19122730s19122730Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural NetworkWei Jiang0Zheng Wang1Jesse S. Jin2Xianfeng Han3Chunguang Li4College of Intelligence and Computing, Tianjin University, Tianjin 300072, ChinaCollege of Intelligence and Computing, Tianjin University, Tianjin 300072, ChinaCollege of Intelligence and Computing, Tianjin University, Tianjin 300072, ChinaCollege of Intelligence and Computing, Tianjin University, Tianjin 300072, ChinaSchool of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, ChinaAutomatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature representations from the heterogeneous acoustic feature groups which may contain redundant and unrelated information leading to low emotion recognition performance in this work. After obtaining the informative features, a fusion network is trained to jointly learn the discriminative acoustic feature representation and a Support Vector Machine (SVM) is used as the final classifier for recognition task. Experimental results on the IEMOCAP dataset demonstrate that the proposed architecture improved the recognition performance, achieving accuracy of 64% compared to existing state-of-the-art approaches.https://www.mdpi.com/1424-8220/19/12/2730human–computer interaction (HCI)speech emotion recognitiondeep neural architectureheterogeneous feature unificationfusion network
spellingShingle Wei Jiang
Zheng Wang
Jesse S. Jin
Xianfeng Han
Chunguang Li
Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
Sensors
human–computer interaction (HCI)
speech emotion recognition
deep neural architecture
heterogeneous feature unification
fusion network
title Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
title_full Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
title_fullStr Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
title_full_unstemmed Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
title_short Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
title_sort speech emotion recognition with heterogeneous feature unification of deep neural network
topic human–computer interaction (HCI)
speech emotion recognition
deep neural architecture
heterogeneous feature unification
fusion network
url https://www.mdpi.com/1424-8220/19/12/2730
work_keys_str_mv AT weijiang speechemotionrecognitionwithheterogeneousfeatureunificationofdeepneuralnetwork
AT zhengwang speechemotionrecognitionwithheterogeneousfeatureunificationofdeepneuralnetwork
AT jessesjin speechemotionrecognitionwithheterogeneousfeatureunificationofdeepneuralnetwork
AT xianfenghan speechemotionrecognitionwithheterogeneousfeatureunificationofdeepneuralnetwork
AT chunguangli speechemotionrecognitionwithheterogeneousfeatureunificationofdeepneuralnetwork