Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition

Speech emotion recognition (SER) systems have evolved into an important method for recognizing a person in several applications, including e-commerce, everyday interactions, law enforcement, and forensics. The SER system’s efficiency depends on the length of the audio samples used for testing and tr...

Full description

Bibliographic Details
Main Authors:	Ammar Amjad, Lal Khan, Hsien-Tsung Chang
Format:	Article
Language:	English
Published:	PeerJ Inc. 2022-08-01
Series:	PeerJ Computer Science
Subjects:	Speaker recognition Data augmentation Deep neural network Multiple window size
Online Access:	https://peerj.com/articles/cs-1053.pdf

_version_	1798038665018998784
author	Ammar Amjad Lal Khan Hsien-Tsung Chang
author_facet	Ammar Amjad Lal Khan Hsien-Tsung Chang
author_sort	Ammar Amjad
collection	DOAJ
description	Speech emotion recognition (SER) systems have evolved into an important method for recognizing a person in several applications, including e-commerce, everyday interactions, law enforcement, and forensics. The SER system’s efficiency depends on the length of the audio samples used for testing and training. However, the different suggested models successfully obtained relatively high accuracy in this study. Moreover, the degree of SER efficiency is not yet optimum due to the limited database, resulting in overfitting and skewing samples. Therefore, the proposed approach presents a data augmentation method that shifts the pitch, uses multiple window sizes, stretches the time, and adds white noise to the original audio. In addition, a deep model is further evaluated to generate a new paradigm for SER. The data augmentation approach increased the limited amount of data from the Pakistani racial speaker speech dataset in the proposed system. The seven-layer framework was employed to provide the most optimal performance in terms of accuracy compared to other multilayer approaches. The seven-layer method is used in existing works to achieve a very high level of accuracy. The suggested system achieved 97.32% accuracy with a 0.032% loss in the 75%:25% splitting ratio. In addition, more than 500 augmentation data samples were added. Therefore, the proposed approach results show that deep neural networks with data augmentation can enhance the SER performance on the Pakistani racial speech dataset.
first_indexed	2024-04-11T21:43:13Z
format	Article
id	doaj.art-93130679fa304b1d99f4a7f000949ad1
institution	Directory Open Access Journal
issn	2376-5992
language	English
last_indexed	2024-04-11T21:43:13Z
publishDate	2022-08-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ Computer Science
spelling	doaj.art-93130679fa304b1d99f4a7f000949ad12022-12-22T04:01:30ZengPeerJ Inc.PeerJ Computer Science2376-59922022-08-018e105310.7717/peerj-cs.1053Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognitionAmmar Amjad0Lal Khan1Hsien-Tsung Chang2Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, TaiwanDepartment of Computer Science and Information Engineering, Chang Gung University, Taoyuan, TaiwanDepartment of Computer Science and Information Engineering, Chang Gung University, Taoyuan, TaiwanSpeech emotion recognition (SER) systems have evolved into an important method for recognizing a person in several applications, including e-commerce, everyday interactions, law enforcement, and forensics. The SER system’s efficiency depends on the length of the audio samples used for testing and training. However, the different suggested models successfully obtained relatively high accuracy in this study. Moreover, the degree of SER efficiency is not yet optimum due to the limited database, resulting in overfitting and skewing samples. Therefore, the proposed approach presents a data augmentation method that shifts the pitch, uses multiple window sizes, stretches the time, and adds white noise to the original audio. In addition, a deep model is further evaluated to generate a new paradigm for SER. The data augmentation approach increased the limited amount of data from the Pakistani racial speaker speech dataset in the proposed system. The seven-layer framework was employed to provide the most optimal performance in terms of accuracy compared to other multilayer approaches. The seven-layer method is used in existing works to achieve a very high level of accuracy. The suggested system achieved 97.32% accuracy with a 0.032% loss in the 75%:25% splitting ratio. In addition, more than 500 augmentation data samples were added. Therefore, the proposed approach results show that deep neural networks with data augmentation can enhance the SER performance on the Pakistani racial speech dataset.https://peerj.com/articles/cs-1053.pdfSpeaker recognitionData augmentationDeep neural networkMultiple window size
spellingShingle	Ammar Amjad Lal Khan Hsien-Tsung Chang Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition PeerJ Computer Science Speaker recognition Data augmentation Deep neural network Multiple window size
title	Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
title_full	Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
title_fullStr	Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
title_full_unstemmed	Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
title_short	Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition
title_sort	data augmentation and deep neural networks for the classification of pakistani racial speakers recognition
topic	Speaker recognition Data augmentation Deep neural network Multiple window size
url	https://peerj.com/articles/cs-1053.pdf
work_keys_str_mv	AT ammaramjad dataaugmentationanddeepneuralnetworksfortheclassificationofpakistaniracialspeakersrecognition AT lalkhan dataaugmentationanddeepneuralnetworksfortheclassificationofpakistaniracialspeakersrecognition AT hsientsungchang dataaugmentationanddeepneuralnetworksfortheclassificationofpakistaniracialspeakersrecognition

Data augmentation and deep neural networks for the classification of Pakistani racial speakers recognition

Similar Items