Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation

Although emotional speech recognition has received increasing emphasis in research and applications, it remains challenging due to the diversity and complexity of emotions and limited datasets. To address these limitations, we propose a novel approach utilizing DCGAN to augment data from the RAVDESS...

Full description

Bibliographic Details
Main Authors: Ji-Young Baek, Seok-Pil Lee
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/18/3966
Description
Summary:Although emotional speech recognition has received increasing emphasis in research and applications, it remains challenging due to the diversity and complexity of emotions and limited datasets. To address these limitations, we propose a novel approach utilizing DCGAN to augment data from the RAVDESS and EmoDB databases. Then, we assess the efficacy of emotion recognition using mel-spectrogram data by utilizing a model that combines CNN and BiLSTM. The preliminary experimental results reveal that the suggested technique contributes to enhancing the emotional speech identification performance. The results of this study provide directions for further development in the field of emotional speech recognition and the potential for practical applications.
ISSN:2079-9292