Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech

Speaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the...

Full description

Bibliographic Details
Main Authors: Nikola Simić, Siniša Suzić, Tijana Nosek, Mia Vujović, Zoran Perić, Milan Savić, Vlado Delić
Format: Article
Language:English
Published: MDPI AG 2022-03-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/24/3/414
_version_ 1797446649985892352
author Nikola Simić
Siniša Suzić
Tijana Nosek
Mia Vujović
Zoran Perić
Milan Savić
Vlado Delić
author_facet Nikola Simić
Siniša Suzić
Tijana Nosek
Mia Vujović
Zoran Perić
Milan Savić
Vlado Delić
author_sort Nikola Simić
collection DOAJ
description Speaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the classification accuracy of developed models significantly decreases when applying them to emotional speech or in the presence of interference. Furthermore, deep models may require a large number of parameters, so constrained solutions are desirable in order to implement them on edge devices in the Internet of Things systems for real-time detection. The aim of this paper is to propose a simple and constrained convolutional neural network for speaker recognition tasks and to examine its robustness for recognition in emotional speech conditions. We examine three quantization methods for developing a constrained network: floating-point eight format, ternary scalar quantization, and binary scalar quantization. The results are demonstrated on the recently recorded SEAC dataset.
first_indexed 2024-03-09T13:43:35Z
format Article
id doaj.art-bfa0ba9560ea4ccb8a310f93b3a49240
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-09T13:43:35Z
publishDate 2022-03-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-bfa0ba9560ea4ccb8a310f93b3a492402023-11-30T21:03:20ZengMDPI AGEntropy1099-43002022-03-0124341410.3390/e24030414Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional SpeechNikola Simić0Siniša Suzić1Tijana Nosek2Mia Vujović3Zoran Perić4Milan Savić5Vlado Delić6Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, SerbiaFaculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, SerbiaFaculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, SerbiaFaculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, SerbiaFaculty of Electronic Engineering, University of Nis, Aleksandra Medvedeva 14, 18000 Nis, SerbiaFaculty of Sciences and Mathematics, University of Pristina in Kosovska Mitrovica, Ive Lole Ribara 29, 38220 Kosovska Mitrovica, SerbiaFaculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, SerbiaSpeaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the classification accuracy of developed models significantly decreases when applying them to emotional speech or in the presence of interference. Furthermore, deep models may require a large number of parameters, so constrained solutions are desirable in order to implement them on edge devices in the Internet of Things systems for real-time detection. The aim of this paper is to propose a simple and constrained convolutional neural network for speaker recognition tasks and to examine its robustness for recognition in emotional speech conditions. We examine three quantization methods for developing a constrained network: floating-point eight format, ternary scalar quantization, and binary scalar quantization. The results are demonstrated on the recently recorded SEAC dataset.https://www.mdpi.com/1099-4300/24/3/414speaker recognitionconvolutional neural networkquantizationemotional speech
spellingShingle Nikola Simić
Siniša Suzić
Tijana Nosek
Mia Vujović
Zoran Perić
Milan Savić
Vlado Delić
Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
Entropy
speaker recognition
convolutional neural network
quantization
emotional speech
title Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
title_full Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
title_fullStr Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
title_full_unstemmed Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
title_short Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
title_sort speaker recognition using constrained convolutional neural networks in emotional speech
topic speaker recognition
convolutional neural network
quantization
emotional speech
url https://www.mdpi.com/1099-4300/24/3/414
work_keys_str_mv AT nikolasimic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech
AT sinisasuzic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech
AT tijananosek speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech
AT miavujovic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech
AT zoranperic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech
AT milansavic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech
AT vladodelic speakerrecognitionusingconstrainedconvolutionalneuralnetworksinemotionalspeech