Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network
ABSTRACTSpeech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, clas...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2023-07-01
|
Series: | Journal of Information and Telecommunication |
Subjects: | |
Online Access: | https://www.tandfonline.com/doi/10.1080/24751839.2023.2187278 |
_version_ | 1797771217505091584 |
---|---|
author | Nhat Truong Pham Sy Dzung Nguyen Vu Song Thuy Nguyen Bich Ngoc Hong Pham Duc Ngoc Minh Dang |
author_facet | Nhat Truong Pham Sy Dzung Nguyen Vu Song Thuy Nguyen Bich Ngoc Hong Pham Duc Ngoc Minh Dang |
author_sort | Nhat Truong Pham |
collection | DOAJ |
description | ABSTRACTSpeech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, classifier schemes, and computational costs. To address the aforementioned problems, we propose a new set of 1D features extracted by using an overlapping sliding window (OSW) technique for SER in this study. In addition, a deep neural network-based classifier scheme called the deep Pattern Recognition Network (PRN) is designed to categorize emotional states from the new set of 1D features. We evaluate the proposed method on the Emo-DB and the AESSD datasets that contain several different emotional states. The experimental results show that the proposed method achieves an accuracy of 98.5% and 87.1% on the Emo-DB and AESSD datasets, respectively. It is also more comparable with accuracy to and better than the state-of-the-art and current approaches that use 1D features on the same datasets for SER. Furthermore, the SHAP (SHapley Additive exPlanations) analysis is employed for interpreting the prediction model to assist system developers in selecting the optimal features to integrate into the desired system. |
first_indexed | 2024-03-12T21:34:18Z |
format | Article |
id | doaj.art-3b217879c86044e79f8b4251af90b877 |
institution | Directory Open Access Journal |
issn | 2475-1839 2475-1847 |
language | English |
last_indexed | 2024-03-12T21:34:18Z |
publishDate | 2023-07-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Journal of Information and Telecommunication |
spelling | doaj.art-3b217879c86044e79f8b4251af90b8772023-07-27T11:47:07ZengTaylor & Francis GroupJournal of Information and Telecommunication2475-18392475-18472023-07-017331733510.1080/24751839.2023.2187278Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural networkNhat Truong Pham0Sy Dzung Nguyen1Vu Song Thuy Nguyen2Bich Ngoc Hong Pham3Duc Ngoc Minh Dang4Division of Computational Mechatronics, Institute for Computational Science, Ton Duc Thang University, Ho Chi Minh City, VietnamLaboratory for Computational Mechatronics, Institute for Computational Science and Artificial Intelligence, Van Lang University, Ho Chi Minh City, VietnamDepartment of Computer Science and Engineering, Michigan State University, Michigan, MI, USAFaculty of Information Technology, Ho Chi Minh City Open University, Ho Chi Minh City, VietnamComputing Fundamental Department, FPT University, Ho Chi Minh City, VietnamABSTRACTSpeech emotion recognition (SER) has several applications, such as e-learning, human-computer interaction, customer service, and healthcare systems. Although researchers have investigated lots of techniques to improve the accuracy of SER, it has been challenging with feature extraction, classifier schemes, and computational costs. To address the aforementioned problems, we propose a new set of 1D features extracted by using an overlapping sliding window (OSW) technique for SER in this study. In addition, a deep neural network-based classifier scheme called the deep Pattern Recognition Network (PRN) is designed to categorize emotional states from the new set of 1D features. We evaluate the proposed method on the Emo-DB and the AESSD datasets that contain several different emotional states. The experimental results show that the proposed method achieves an accuracy of 98.5% and 87.1% on the Emo-DB and AESSD datasets, respectively. It is also more comparable with accuracy to and better than the state-of-the-art and current approaches that use 1D features on the same datasets for SER. Furthermore, the SHAP (SHapley Additive exPlanations) analysis is employed for interpreting the prediction model to assist system developers in selecting the optimal features to integrate into the desired system.https://www.tandfonline.com/doi/10.1080/24751839.2023.2187278Feature extractionoverlapping sliding windowpattern recognition networkSHAP analysisspeech emotion recognition |
spellingShingle | Nhat Truong Pham Sy Dzung Nguyen Vu Song Thuy Nguyen Bich Ngoc Hong Pham Duc Ngoc Minh Dang Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network Journal of Information and Telecommunication Feature extraction overlapping sliding window pattern recognition network SHAP analysis speech emotion recognition |
title | Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network |
title_full | Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network |
title_fullStr | Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network |
title_full_unstemmed | Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network |
title_short | Speech emotion recognition using overlapping sliding window and Shapley additive explainable deep neural network |
title_sort | speech emotion recognition using overlapping sliding window and shapley additive explainable deep neural network |
topic | Feature extraction overlapping sliding window pattern recognition network SHAP analysis speech emotion recognition |
url | https://www.tandfonline.com/doi/10.1080/24751839.2023.2187278 |
work_keys_str_mv | AT nhattruongpham speechemotionrecognitionusingoverlappingslidingwindowandshapleyadditiveexplainabledeepneuralnetwork AT sydzungnguyen speechemotionrecognitionusingoverlappingslidingwindowandshapleyadditiveexplainabledeepneuralnetwork AT vusongthuynguyen speechemotionrecognitionusingoverlappingslidingwindowandshapleyadditiveexplainabledeepneuralnetwork AT bichngochongpham speechemotionrecognitionusingoverlappingslidingwindowandshapleyadditiveexplainabledeepneuralnetwork AT ducngocminhdang speechemotionrecognitionusingoverlappingslidingwindowandshapleyadditiveexplainabledeepneuralnetwork |