Speech Emotion Recognition Based on Parallel CNN-Attention Networks with Multi-Fold Data Augmentation

In this paper, an automatic speech emotion recognition (SER) task of classifying eight different emotions was experimented using parallel based networks trained using the Ryeson Audio-Visual Dataset of Speech and Song (RAVDESS) dataset. A combination of a CNN-based network and attention-based networ...

Full description

Bibliographic Details
Main Authors: John Lorenzo Bautista, Yun Kyung Lee, Hyun Soon Shin
Format: Article
Language:English
Published: MDPI AG 2022-11-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/23/3935