Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network

ABSTRACTSpeech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, clas...

Full description

Bibliographic Details
Main Authors: Nhat Truong Pham, Sy Dzung Nguyen, Vu Song Thuy Nguyen, Bich Ngoc Hong Pham, Duc Ngoc Minh Dang
Format: Article
Language:English
Published: Taylor & Francis Group 2023-07-01
Series:Journal of Information and Telecommunication
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/24751839.2023.2187278
_version_ 1797771217505091584
author Nhat Truong Pham
Sy Dzung Nguyen
Vu Song Thuy Nguyen
Bich Ngoc Hong Pham
Duc Ngoc Minh Dang
author_facet Nhat Truong Pham
Sy Dzung Nguyen
Vu Song Thuy Nguyen
Bich Ngoc Hong Pham
Duc Ngoc Minh Dang
author_sort Nhat Truong Pham
collection DOAJ
description ABSTRACTSpeech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, classifier schemes, and computational costs. To address the aforementioned problems, we propose a new set of 1D features extracted by using an overlapping sliding window (OSW) technique for SER in this study. In addition, a deep neural network-based classifier scheme called the deep Pattern Recognition Network (PRN) is designed to categorize emotional states from the new set of 1D features. We evaluate the proposed method on the Emo-DB and the AESSD datasets that contain several different emotional states. The experimental results show that the proposed method achieves an accuracy of 98.5% and 87.1% on the Emo-DB and AESSD datasets, respectively. It is also more comparable with accuracy to and better than the state-of-the-art and current approaches that use 1D features on the same datasets for SER. Furthermore, the SHAP (SHapley Additive exPlanations) analysis is employed for interpreting the prediction model to assist system developers in selecting the optimal features to integrate into the desired system.
first_indexed 2024-03-12T21:34:18Z
format Article
id doaj.art-3b217879c86044e79f8b4251af90b877
institution Directory Open Access Journal
issn 2475-1839
2475-1847
language English
last_indexed 2024-03-12T21:34:18Z
publishDate 2023-07-01
publisher Taylor & Francis Group
record_format Article
series Journal of Information and Telecommunication
spelling doaj.art-3b217879c86044e79f8b4251af90b8772023-07-27T11:47:07ZengTaylor & Francis GroupJournal of Information and Telecommunication2475-18392475-18472023-07-017331733510.1080/24751839.2023.2187278Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural networkNhat Truong Pham0Sy Dzung Nguyen1Vu Song Thuy Nguyen2Bich Ngoc Hong Pham3Duc Ngoc Minh Dang4Division of Computational Mechatronics, Institute for Computational Science, Ton Duc Thang University, Ho Chi Minh City, VietnamLaboratory for Computational Mechatronics, Institute for Computational Science and Artificial Intelligence, Van Lang University, Ho Chi Minh City, VietnamDepartment of Computer Science and Engineering, Michigan State University, Michigan, MI, USAFaculty of Information Technology, Ho Chi Minh City Open University, Ho Chi Minh City, VietnamComputing Fundamental Department, FPT University, Ho Chi Minh City, VietnamABSTRACTSpeech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, classifier schemes, and computational costs. To address the aforementioned problems, we propose a new set of 1D features extracted by using an overlapping sliding window (OSW) technique for SER in this study. In addition, a deep neural network-based classifier scheme called the deep Pattern Recognition Network (PRN) is designed to categorize emotional states from the new set of 1D features. We evaluate the proposed method on the Emo-DB and the AESSD datasets that contain several different emotional states. The experimental results show that the proposed method achieves an accuracy of 98.5% and 87.1% on the Emo-DB and AESSD datasets, respectively. It is also more comparable with accuracy to and better than the state-of-the-art and current approaches that use 1D features on the same datasets for SER. Furthermore, the SHAP (SHapley Additive exPlanations) analysis is employed for interpreting the prediction model to assist system developers in selecting the optimal features to integrate into the desired system.https://www.tandfonline.com/doi/10.1080/24751839.2023.2187278Feature extractionoverlapping sliding windowpattern recognition networkSHAP analysisspeech emotion recognition
spellingShingle Nhat Truong Pham
Sy Dzung Nguyen
Vu Song Thuy Nguyen
Bich Ngoc Hong Pham
Duc Ngoc Minh Dang
Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network
Journal of Information and Telecommunication
Feature extraction
overlapping sliding window
pattern recognition network
SHAP analysis
speech emotion recognition
title Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network
title_full Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network
title_fullStr Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network
title_full_unstemmed Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network
title_short Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network
title_sort speech emotion recognition using overlapping sliding window and shapley additive explainable deep neural network
topic Feature extraction
overlapping sliding window
pattern recognition network
SHAP analysis
speech emotion recognition
url https://www.tandfonline.com/doi/10.1080/24751839.2023.2187278
work_keys_str_mv AT nhattruongpham speechemotionrecognitionusingoverlappingslidingwindowandshapleyadditiveexplainabledeepneuralnetwork
AT sydzungnguyen speechemotionrecognitionusingoverlappingslidingwindowandshapleyadditiveexplainabledeepneuralnetwork
AT vusongthuynguyen speechemotionrecognitionusingoverlappingslidingwindowandshapleyadditiveexplainabledeepneuralnetwork
AT bichngochongpham speechemotionrecognitionusingoverlappingslidingwindowandshapleyadditiveexplainabledeepneuralnetwork
AT ducngocminhdang speechemotionrecognitionusingoverlappingslidingwindowandshapleyadditiveexplainabledeepneuralnetwork