Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning

Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorit...

Full description

Bibliographic Details
Main Authors: Apeksha Aggarwal, Akshat Srivastava, Ajay Agarwal, Nidhi Chahal, Dilbag Singh, Abeer Ali Alnuaim, Aseel Alhadlaq, Heung-No Lee
Format: Article
Language:English
Published: MDPI AG 2022-03-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/22/6/2378
_version_ 1797442254055407616
author Apeksha Aggarwal
Akshat Srivastava
Ajay Agarwal
Nidhi Chahal
Dilbag Singh
Abeer Ali Alnuaim
Aseel Alhadlaq
Heung-No Lee
author_facet Apeksha Aggarwal
Akshat Srivastava
Ajay Agarwal
Nidhi Chahal
Dilbag Singh
Abeer Ali Alnuaim
Aseel Alhadlaq
Heung-No Lee
author_sort Apeksha Aggarwal
collection DOAJ
description Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN.
first_indexed 2024-03-09T12:39:10Z
format Article
id doaj.art-b32981e9cb6a43abb180b59599d6e3be
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-09T12:39:10Z
publishDate 2022-03-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-b32981e9cb6a43abb180b59599d6e3be2023-11-30T22:20:19ZengMDPI AGSensors1424-82202022-03-01226237810.3390/s22062378Two-Way Feature Extraction for Speech Emotion Recognition Using Deep LearningApeksha Aggarwal0Akshat Srivastava1Ajay Agarwal2Nidhi Chahal3Dilbag Singh4Abeer Ali Alnuaim5Aseel Alhadlaq6Heung-No Lee7Department of Computer Science Engineering & Information Technology, Jaypee Institute of Information Technology, A 10, Sector 62, Noida 201307, IndiaSchool of Computer Science Engineering and Technology, Bennett University, Plot Nos 8-11, TechZone 2, Greater Noida 201310, IndiaDepartment of Information Technology, KIET Group of Institutions, Delhi-NCR, Meerut Road (NH-58), Ghaziabad 201206, IndiaNidhi Chahal, NIIT Limited, Gurugram 110019, IndiaSchool of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, KoreaDepartment of Computer Science and Engineering, College of Applied Studies and Community Services, King Saud University, P.O. Box 22459, Riyadh 11495, Saudi ArabiaDepartment of Computer Science and Engineering, College of Applied Studies and Community Services, King Saud University, P.O. Box 22459, Riyadh 11495, Saudi ArabiaSchool of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, KoreaRecognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN.https://www.mdpi.com/1424-8220/22/6/2378speech emotion recognitionmachine learningneural network
spellingShingle Apeksha Aggarwal
Akshat Srivastava
Ajay Agarwal
Nidhi Chahal
Dilbag Singh
Abeer Ali Alnuaim
Aseel Alhadlaq
Heung-No Lee
Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
Sensors
speech emotion recognition
machine learning
neural network
title Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_full Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_fullStr Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_full_unstemmed Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_short Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
title_sort two way feature extraction for speech emotion recognition using deep learning
topic speech emotion recognition
machine learning
neural network
url https://www.mdpi.com/1424-8220/22/6/2378
work_keys_str_mv AT apekshaaggarwal twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT akshatsrivastava twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT ajayagarwal twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT nidhichahal twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT dilbagsingh twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT abeeralialnuaim twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT aseelalhadlaq twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning
AT heungnolee twowayfeatureextractionforspeechemotionrecognitionusingdeeplearning