Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages

In a conventional speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language do not exist, data from other languages can be used instead. We experiment with cross-lingual and multil...

Full description

Bibliographic Details
Main Authors: Ephrem Afele Retta, Richard Sutcliffe, Jabar Mahmood, Michael Abebe Berwo, Eiad Almekhlafi, Sajjad Ahmad Khan, Shehzad Ashraf Chaudhry, Mustafa Mhamed, Jun Feng
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/23/12587
_version_ 1827592534107357184
author Ephrem Afele Retta
Richard Sutcliffe
Jabar Mahmood
Michael Abebe Berwo
Eiad Almekhlafi
Sajjad Ahmad Khan
Shehzad Ashraf Chaudhry
Mustafa Mhamed
Jun Feng
author_facet Ephrem Afele Retta
Richard Sutcliffe
Jabar Mahmood
Michael Abebe Berwo
Eiad Almekhlafi
Sajjad Ahmad Khan
Shehzad Ashraf Chaudhry
Mustafa Mhamed
Jun Feng
author_sort Ephrem Afele Retta
collection DOAJ
description In a conventional speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language do not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German, and Urdu. For Amharic, we use our own publicly available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu, we use the existing RAVDESS, EMO-DB, and URDU datasets. We followed previous research in mapping labels for all of the datasets to just two classes: positive and negative. Thus, we can compare performance on different languages directly and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. The results, averaged for the three models, were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each of the following pairs: Amharic↔German, Amharic↔English, and Amharic↔Urdu. The results with Amharic as the target suggested that using English or German as the source gives the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percentage points greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training an SER classifier when resources for a language are scarce.
first_indexed 2024-03-09T01:55:53Z
format Article
id doaj.art-95ae339e168a4471be069f63e45f3bff
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T01:55:53Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-95ae339e168a4471be069f63e45f3bff2023-12-08T15:11:05ZengMDPI AGApplied Sciences2076-34172023-11-0113231258710.3390/app132312587Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other LanguagesEphrem Afele Retta0Richard Sutcliffe1Jabar Mahmood2Michael Abebe Berwo3Eiad Almekhlafi4Sajjad Ahmad Khan5Shehzad Ashraf Chaudhry6Mustafa Mhamed7Jun Feng8School of Information Science and Technology, Northwest University, Xi’an 710127, ChinaSchool of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UKFaculty of Computing and Information Technology, University of Sialkot, Sialkot 51040, Punjab, PakistanSchool of Information and Engineering, Chang’an University, Xi’an 710064, ChinaSchool of Information Science and Technology, Northwest University, Xi’an 710127, ChinaComputer Engineering Department, Hoseo University, Asan 31499, Republic of KoreaDepartment of Computer Science and Information Technology, College of Engineering, Abu Dhabi University, Abu Dhabi 59911, United Arab EmiratesSchool of Information Science and Technology, Northwest University, Xi’an 710127, ChinaSchool of Information Science and Technology, Northwest University, Xi’an 710127, ChinaIn a conventional speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language do not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German, and Urdu. For Amharic, we use our own publicly available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu, we use the existing RAVDESS, EMO-DB, and URDU datasets. We followed previous research in mapping labels for all of the datasets to just two classes: positive and negative. Thus, we can compare performance on different languages directly and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. The results, averaged for the three models, were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each of the following pairs: Amharic↔German, Amharic↔English, and Amharic↔Urdu. The results with Amharic as the target suggested that using English or German as the source gives the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percentage points greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training an SER classifier when resources for a language are scarce.https://www.mdpi.com/2076-3417/13/23/12587speech emotion recognitionmultilingualcross-lingualfeature extraction
spellingShingle Ephrem Afele Retta
Richard Sutcliffe
Jabar Mahmood
Michael Abebe Berwo
Eiad Almekhlafi
Sajjad Ahmad Khan
Shehzad Ashraf Chaudhry
Mustafa Mhamed
Jun Feng
Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages
Applied Sciences
speech emotion recognition
multilingual
cross-lingual
feature extraction
title Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages
title_full Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages
title_fullStr Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages
title_full_unstemmed Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages
title_short Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages
title_sort cross corpus multilingual speech emotion recognition amharic vs other languages
topic speech emotion recognition
multilingual
cross-lingual
feature extraction
url https://www.mdpi.com/2076-3417/13/23/12587
work_keys_str_mv AT ephremafeleretta crosscorpusmultilingualspeechemotionrecognitionamharicvsotherlanguages
AT richardsutcliffe crosscorpusmultilingualspeechemotionrecognitionamharicvsotherlanguages
AT jabarmahmood crosscorpusmultilingualspeechemotionrecognitionamharicvsotherlanguages
AT michaelabebeberwo crosscorpusmultilingualspeechemotionrecognitionamharicvsotherlanguages
AT eiadalmekhlafi crosscorpusmultilingualspeechemotionrecognitionamharicvsotherlanguages
AT sajjadahmadkhan crosscorpusmultilingualspeechemotionrecognitionamharicvsotherlanguages
AT shehzadashrafchaudhry crosscorpusmultilingualspeechemotionrecognitionamharicvsotherlanguages
AT mustafamhamed crosscorpusmultilingualspeechemotionrecognitionamharicvsotherlanguages
AT junfeng crosscorpusmultilingualspeechemotionrecognitionamharicvsotherlanguages