Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorit...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-03-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/22/6/2378 |
_version_ | 1797442254055407616 |
---|---|
author | Apeksha Aggarwal Akshat Srivastava Ajay Agarwal Nidhi Chahal Dilbag Singh Abeer Ali Alnuaim Aseel Alhadlaq Heung-No Lee |
author_facet | Apeksha Aggarwal Akshat Srivastava Ajay Agarwal Nidhi Chahal Dilbag Singh Abeer Ali Alnuaim Aseel Alhadlaq Heung-No Lee |
author_sort | Apeksha Aggarwal |
collection | DOAJ |
description | Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN. |
first_indexed | 2024-03-09T12:39:10Z |
format | Article |
id | doaj.art-b32981e9cb6a43abb180b59599d6e3be |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-09T12:39:10Z |
publishDate | 2022-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-b32981e9cb6a43abb180b59599d6e3be2023-11-30T22:20:19ZengMDPI AGSensors1424-82202022-03-01226237810.3390/s22062378Two-Way Feature Extraction for Speech Emotion Recognition Using Deep LearningApeksha Aggarwal0Akshat Srivastava1Ajay Agarwal2Nidhi Chahal3Dilbag Singh4Abeer Ali Alnuaim5Aseel Alhadlaq6Heung-No Lee7Department of Computer Science Engineering & Information Technology, Jaypee Institute of Information Technology, A 10, Sector 62, Noida 201307, IndiaSchool of Computer Science Engineering and Technology, Bennett University, Plot Nos 8-11, TechZone 2, Greater Noida 201310, IndiaDepartment of Information Technology, KIET Group of Institutions, Delhi-NCR, Meerut Road (NH-58), Ghaziabad 201206, IndiaNidhi Chahal, NIIT Limited, Gurugram 110019, IndiaSchool of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, KoreaDepartment of Computer Science and Engineering, College of Applied Studies and Community Services, King Saud University, P.O. Box 22459, Riyadh 11495, Saudi ArabiaDepartment of Computer Science and Engineering, College of Applied Studies and Community Services, King Saud University, P.O. Box 22459, Riyadh 11495, Saudi ArabiaSchool of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, KoreaRecognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN.https://www.mdpi.com/1424-8220/22/6/2378speech emotion recognitionmachine learningneural network |
spellingShingle | Apeksha Aggarwal Akshat Srivastava Ajay Agarwal Nidhi Chahal Dilbag Singh Abeer Ali Alnuaim Aseel Alhadlaq Heung-No Lee Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning Sensors speech emotion recognition machine learning neural network |
title | Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning |
title_full | Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning |
title_fullStr | Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning |
title_full_unstemmed | Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning |
title_short | Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning |
title_sort | two way feature extraction for speech emotion recognition using deep learning |
topic | speech emotion recognition machine learning neural network |
url | https://www.mdpi.com/1424-8220/22/6/2378 |
work_keys_str_mv | AT apekshaaggarwal twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT akshatsrivastava twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT ajayagarwal twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT nidhichahal twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT dilbagsingh twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT abeeralialnuaim twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT aseelalhadlaq twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning AT heungnolee twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning |