Multi-branch feature learning based speech emotion recognition using SCAR-NET
Speech emotion recognition (SER) is an active research area in affective computing. Recognizing emotions from speech signals helps to assess human behaviour, which has promising applications in the area of human-computer interaction. The performance of deep learning-based SER methods relies heavily...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2023-12-01
|
Series: | Connection Science |
Subjects: | |
Online Access: | http://dx.doi.org/10.1080/09540091.2023.2189217 |
_version_ | 1797684015412543488 |
---|---|
author | Keji Mao Yuxiang Wang Ligang Ren Jinhong Zhang Jiefan Qiu Guanglin Dai |
author_facet | Keji Mao Yuxiang Wang Ligang Ren Jinhong Zhang Jiefan Qiu Guanglin Dai |
author_sort | Keji Mao |
collection | DOAJ |
description | Speech emotion recognition (SER) is an active research area in affective computing. Recognizing emotions from speech signals helps to assess human behaviour, which has promising applications in the area of human-computer interaction. The performance of deep learning-based SER methods relies heavily on feature learning. In this paper, we propose SCAR-NET, an improved convolutional neural network, to extract emotional features from speech signals and implement classification. This work includes two main parts: First, we extract spectral, temporal, and spectral-temporal correlation features through three parallel paths; and then split-convolve-aggregate residual blocks are designed for multi-branch deep feature learning. The features are refined by global average pooling (GAP) and pass through a softmax classifier to generate predictions for different emotions. We also conduct a series of experiments to evaluate the robustness and effectiveness of SCAR-NET which can achieve 96.45%, 83.13%, and 89.93% accuracy on the speech emotion datasets EMO-DB, SAVEE, and RAVDESS. These results show the outperformance of SCAR-NET. |
first_indexed | 2024-03-12T00:24:29Z |
format | Article |
id | doaj.art-98092f9487a74ff8a2072b5677cf3603 |
institution | Directory Open Access Journal |
issn | 0954-0091 1360-0494 |
language | English |
last_indexed | 2024-03-12T00:24:29Z |
publishDate | 2023-12-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Connection Science |
spelling | doaj.art-98092f9487a74ff8a2072b5677cf36032023-09-15T10:48:01ZengTaylor & Francis GroupConnection Science0954-00911360-04942023-12-0135110.1080/09540091.2023.21892172189217Multi-branch feature learning based speech emotion recognition using SCAR-NETKeji Mao0Yuxiang Wang1Ligang Ren2Jinhong Zhang3Jiefan Qiu4Guanglin Dai5Zhejiang University of TechnologyZhejiang University of TechnologyZhejiang University of TechnologyZhejiang University of TechnologyZhejiang University of TechnologyZhejiang University of TechnologySpeech emotion recognition (SER) is an active research area in affective computing. Recognizing emotions from speech signals helps to assess human behaviour, which has promising applications in the area of human-computer interaction. The performance of deep learning-based SER methods relies heavily on feature learning. In this paper, we propose SCAR-NET, an improved convolutional neural network, to extract emotional features from speech signals and implement classification. This work includes two main parts: First, we extract spectral, temporal, and spectral-temporal correlation features through three parallel paths; and then split-convolve-aggregate residual blocks are designed for multi-branch deep feature learning. The features are refined by global average pooling (GAP) and pass through a softmax classifier to generate predictions for different emotions. We also conduct a series of experiments to evaluate the robustness and effectiveness of SCAR-NET which can achieve 96.45%, 83.13%, and 89.93% accuracy on the speech emotion datasets EMO-DB, SAVEE, and RAVDESS. These results show the outperformance of SCAR-NET.http://dx.doi.org/10.1080/09540091.2023.2189217affective computingspeech emotion recognitionconvolutional neural networkparallel pathsfeature learning |
spellingShingle | Keji Mao Yuxiang Wang Ligang Ren Jinhong Zhang Jiefan Qiu Guanglin Dai Multi-branch feature learning based speech emotion recognition using SCAR-NET Connection Science affective computing speech emotion recognition convolutional neural network parallel paths feature learning |
title | Multi-branch feature learning based speech emotion recognition using SCAR-NET |
title_full | Multi-branch feature learning based speech emotion recognition using SCAR-NET |
title_fullStr | Multi-branch feature learning based speech emotion recognition using SCAR-NET |
title_full_unstemmed | Multi-branch feature learning based speech emotion recognition using SCAR-NET |
title_short | Multi-branch feature learning based speech emotion recognition using SCAR-NET |
title_sort | multi branch feature learning based speech emotion recognition using scar net |
topic | affective computing speech emotion recognition convolutional neural network parallel paths feature learning |
url | http://dx.doi.org/10.1080/09540091.2023.2189217 |
work_keys_str_mv | AT kejimao multibranchfeaturelearningbasedspeechemotionrecognitionusingscarnet AT yuxiangwang multibranchfeaturelearningbasedspeechemotionrecognitionusingscarnet AT ligangren multibranchfeaturelearningbasedspeechemotionrecognitionusingscarnet AT jinhongzhang multibranchfeaturelearningbasedspeechemotionrecognitionusingscarnet AT jiefanqiu multibranchfeaturelearningbasedspeechemotionrecognitionusingscarnet AT guanglindai multibranchfeaturelearningbasedspeechemotionrecognitionusingscarnet |