Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech

Speaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the...

Full description

Bibliographic Details
Main Authors:	Nikola Simić, Siniša Suzić, Tijana Nosek, Mia Vujović, Zoran Perić, Milan Savić, Vlado Delić
Format:	Article
Language:	English
Published:	MDPI AG 2022-03-01
Series:	Entropy
Subjects:	speaker recognition convolutional neural network quantization emotional speech
Online Access:	https://www.mdpi.com/1099-4300/24/3/414

_version_	1797446649985892352
author	Nikola Simić Siniša Suzić Tijana Nosek Mia Vujović Zoran Perić Milan Savić Vlado Delić
author_facet	Nikola Simić Siniša Suzić Tijana Nosek Mia Vujović Zoran Perić Milan Savić Vlado Delić
author_sort	Nikola Simić
collection	DOAJ
description	Speaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the classification accuracy of developed models significantly decreases when applying them to emotional speech or in the presence of interference. Furthermore, deep models may require a large number of parameters, so constrained solutions are desirable in order to implement them on edge devices in the Internet of Things systems for real-time detection. The aim of this paper is to propose a simple and constrained convolutional neural network for speaker recognition tasks and to examine its robustness for recognition in emotional speech conditions. We examine three quantization methods for developing a constrained network: floating-point eight format, ternary scalar quantization, and binary scalar quantization. The results are demonstrated on the recently recorded SEAC dataset.
first_indexed	2024-03-09T13:43:35Z
format	Article
id	doaj.art-bfa0ba9560ea4ccb8a310f93b3a49240
institution	Directory Open Access Journal
issn	1099-4300
language	English
last_indexed	2024-03-09T13:43:35Z
publishDate	2022-03-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj.art-bfa0ba9560ea4ccb8a310f93b3a492402023-11-30T21:03:20ZengMDPI AGEntropy1099-43002022-03-0124341410.3390/e24030414Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional SpeechNikola Simić0Siniša Suzić1Tijana Nosek2Mia Vujović3Zoran Perić4Milan Savić5Vlado Delić6Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, SerbiaFaculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, SerbiaFaculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, SerbiaFaculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, SerbiaFaculty of Electronic Engineering, University of Nis, Aleksandra Medvedeva 14, 18000 Nis, SerbiaFaculty of Sciences and Mathematics, University of Pristina in Kosovska Mitrovica, Ive Lole Ribara 29, 38220 Kosovska Mitrovica, SerbiaFaculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, SerbiaSpeaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the classification accuracy of developed models significantly decreases when applying them to emotional speech or in the presence of interference. Furthermore, deep models may require a large number of parameters, so constrained solutions are desirable in order to implement them on edge devices in the Internet of Things systems for real-time detection. The aim of this paper is to propose a simple and constrained convolutional neural network for speaker recognition tasks and to examine its robustness for recognition in emotional speech conditions. We examine three quantization methods for developing a constrained network: floating-point eight format, ternary scalar quantization, and binary scalar quantization. The results are demonstrated on the recently recorded SEAC dataset.https://www.mdpi.com/1099-4300/24/3/414speaker recognitionconvolutional neural networkquantizationemotional speech
spellingShingle	Nikola Simić Siniša Suzić Tijana Nosek Mia Vujović Zoran Perić Milan Savić Vlado Delić Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech Entropy speaker recognition convolutional neural network quantization emotional speech
title	Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
title_full	Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
title_fullStr	Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
title_full_unstemmed	Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
title_short	Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
title_sort	speaker recognition using constrained convolutional neural networks in emotional speech
topic	speaker recognition convolutional neural network quantization emotional speech
url	https://www.mdpi.com/1099-4300/24/3/414
work_keys_str_mv	AT nikolasimic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech AT sinisasuzic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech AT tijananosek speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech AT miavujovic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech AT zoranperic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech AT milansavic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech AT vladodelic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech

Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech

Similar Items