Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis

Speech Emotions Recognition (SER) has gained significant attention in the fields of human–computer interaction and speech processing. In this article, we present a novel approach to improve SER performance by interpreting the Mel Frequency Cepstral Coefficients (MFCC) as a multivariate functional da...

Full description

Bibliographic Details
Main Author: Matthieu Saumard
Format: Article
Language:English
Published: MDPI AG 2023-08-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/7/3/146
_version_ 1797581210619215872
author Matthieu Saumard
author_facet Matthieu Saumard
author_sort Matthieu Saumard
collection DOAJ
description Speech Emotions Recognition (SER) has gained significant attention in the fields of human–computer interaction and speech processing. In this article, we present a novel approach to improve SER performance by interpreting the Mel Frequency Cepstral Coefficients (MFCC) as a multivariate functional data object, which accelerates learning while maintaining high accuracy. To treat MFCCs as functional data, we preprocess them as images and apply resizing techniques. By representing MFCCs as functional data, we leverage the temporal dynamics of speech, capturing essential emotional cues more effectively. Consequently, this enhancement significantly contributes to the learning process of SER methods without compromising performance. Subsequently, we employ a supervised learning model, specifically a functional Support Vector Machine (SVM), directly on the MFCC represented as functional data. This enables the utilization of the full functional information, allowing for more accurate emotion recognition. The proposed approach is rigorously evaluated on two distinct databases, EMO-DB and IEMOCAP, serving as benchmarks for SER evaluation. Our method demonstrates competitive results in terms of accuracy, showcasing its effectiveness in emotion recognition. Furthermore, our approach significantly reduces the learning time, making it computationally efficient and practical for real-world applications. In conclusion, our novel approach of treating MFCCs as multivariate functional data objects exhibits superior performance in SER tasks, delivering both improved accuracy and substantial time savings during the learning process. This advancement holds great potential for enhancing human–computer interaction and enabling more sophisticated emotion-aware applications.
first_indexed 2024-03-10T23:02:04Z
format Article
id doaj.art-11853f585dd247739fd4ede9b41f58d3
institution Directory Open Access Journal
issn 2504-2289
language English
last_indexed 2024-03-10T23:02:04Z
publishDate 2023-08-01
publisher MDPI AG
record_format Article
series Big Data and Cognitive Computing
spelling doaj.art-11853f585dd247739fd4ede9b41f58d32023-11-19T09:34:22ZengMDPI AGBig Data and Cognitive Computing2504-22892023-08-017314610.3390/bdcc7030146Enhancing Speech Emotions Recognition Using Multivariate Functional Data AnalysisMatthieu Saumard0LabISEN Yncréa Ouest, VISION-AD Team, 20 rue Cuirassé Bretagne, 29200 Brest, FranceSpeech Emotions Recognition (SER) has gained significant attention in the fields of human–computer interaction and speech processing. In this article, we present a novel approach to improve SER performance by interpreting the Mel Frequency Cepstral Coefficients (MFCC) as a multivariate functional data object, which accelerates learning while maintaining high accuracy. To treat MFCCs as functional data, we preprocess them as images and apply resizing techniques. By representing MFCCs as functional data, we leverage the temporal dynamics of speech, capturing essential emotional cues more effectively. Consequently, this enhancement significantly contributes to the learning process of SER methods without compromising performance. Subsequently, we employ a supervised learning model, specifically a functional Support Vector Machine (SVM), directly on the MFCC represented as functional data. This enables the utilization of the full functional information, allowing for more accurate emotion recognition. The proposed approach is rigorously evaluated on two distinct databases, EMO-DB and IEMOCAP, serving as benchmarks for SER evaluation. Our method demonstrates competitive results in terms of accuracy, showcasing its effectiveness in emotion recognition. Furthermore, our approach significantly reduces the learning time, making it computationally efficient and practical for real-world applications. In conclusion, our novel approach of treating MFCCs as multivariate functional data objects exhibits superior performance in SER tasks, delivering both improved accuracy and substantial time savings during the learning process. This advancement holds great potential for enhancing human–computer interaction and enabling more sophisticated emotion-aware applications.https://www.mdpi.com/2504-2289/7/3/146speech emotion recognitionfunctional dataMFCC
spellingShingle Matthieu Saumard
Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis
Big Data and Cognitive Computing
speech emotion recognition
functional data
MFCC
title Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis
title_full Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis
title_fullStr Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis
title_full_unstemmed Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis
title_short Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis
title_sort enhancing speech emotions recognition using multivariate functional data analysis
topic speech emotion recognition
functional data
MFCC
url https://www.mdpi.com/2504-2289/7/3/146
work_keys_str_mv AT matthieusaumard enhancingspeechemotionsrecognitionusingmultivariatefunctionaldataanalysis