The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network

The existing approaches to detecting synthesized speech, based on the current issues of synthesizing voice sequences, are considered. The stages of the algorithm for detecting spoofing attacks on voice biometric systems are described, and its final workflow is presented. The research focuses mainly...

Full description

Bibliographic Details
Main Authors:	Roman A. Murtazin, Aleksandr Yu. Kuznetsov, Evgeny A. Fedorov, Ilnur M. Garipov, Anna V. Kholodenina, Yulia B. Baldanova, Alisa A. Vorobeva
Format:	Article
Language:	English
Published:	Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 2021-08-01
Series:	Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Subjects:	biometric automatic speaker verification in banking synthetic speech spoofing detection cepstral analysis convolutional neural network
Online Access:	https://ntv.ifmo.ru/file/article/20581.pdf

_version_	1819096076512133120
author	Roman A. Murtazin Aleksandr Yu. Kuznetsov Evgeny A. Fedorov Ilnur M. Garipov Anna V. Kholodenina Yulia B. Baldanova Alisa A. Vorobeva
author_facet	Roman A. Murtazin Aleksandr Yu. Kuznetsov Evgeny A. Fedorov Ilnur M. Garipov Anna V. Kholodenina Yulia B. Baldanova Alisa A. Vorobeva
author_sort	Roman A. Murtazin
collection	DOAJ
description	The existing approaches to detecting synthesized speech, based on the current issues of synthesizing voice sequences, are considered. The stages of the algorithm for detecting spoofing attacks on voice biometric systems are described, and its final workflow is presented. The research focuses mainly on detecting synthesized speech, as it is the most dangerous type of attacks. The authors designed a software application for an experimental study, present its structure and propose the detection synthesized speech algorithm. This algorithm uses mel-frequency and constant Q cepstral coefficients to extract speech features. A Gaussian mixture model is used to construct a user model. Convolutional neural network was chosen as a classifier to determine the voice’s authenticity. Two basic methods for combating spoofing attacks, proposed by the authors of the ASVspoof2019 competition, were selected for making comparisons. One of these methods involved using linear frequency cepstral coefficients as speech features, while the other method used constant Q. Both solutions used Gaussian mixture models for classification. To evaluate the effectiveness of the proposed solution and compare it with other methods, a voice database was created. The selected EER and minDCF metrics were applied. The experimental results demonstrated the advantages of the proposed algorithm in comparison with the other algorithms. An advantage of the proposed solution is that it uses extracted speech features that perform efficiently when it comes to user identification. This makes it possible to use the algorithm to optimize a voice biometric system that has embedded protection against spoofing attacks that is built on speech synthesis. In addition, it is possible to use the proposed method for voice identification with minimal modifications required. Voice biometric identification systems have excellent opportunities in the banking sector. Such systems allow banks to simplify and accelerate the process of financial transactions and provide their users with advanced banking functions remotely. The implementation of voice biometric systems is difficult by their vulnerability to spoofing attacks, particularly to those conducted by means of speech synthesis. The proposed solution can be integrated into voice biometric systems to improve their security
first_indexed	2024-12-21T23:53:27Z
format	Article
id	doaj.art-85e3ccfead6e4d3e818545195a4a8fd8
institution	Directory Open Access Journal
issn	2226-1494 2500-0373
language	English
last_indexed	2024-12-21T23:53:27Z
publishDate	2021-08-01
publisher	Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
record_format	Article
series	Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
spelling	doaj.art-85e3ccfead6e4d3e818545195a4a8fd82022-12-21T18:45:52ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732021-08-0121454555210.17586/2226-1494-2021-21-4-545-552The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural networkRoman A. Murtazin0https://orcid.org/0000-0003-3669-7586Aleksandr Yu. Kuznetsov1https://orcid.org/0000-0002-5702-3786Evgeny A. Fedorov2https://orcid.org/0000-0003-2911-5509Ilnur M. Garipov3https://orcid.org/0000-0003-3108-5484Anna V. Kholodenina4https://orcid.org/0000-0003-1911-3710Yulia B. Baldanova5https://orcid.org/0000-0002-6751-8993Alisa A. Vorobeva6https://orcid.org/0000-0001-6691-6167Engineer, ITMO University, Saint Petersburg, 197101, Russian FederationPhD, Associate Professor, ITMO University, Saint Petersburg, 197101, Russian FederationEngineer, ITMO University, Saint Petersburg, 197101, Russian Federation; Technical Specialist, Laboratory PPS Ltd, Saint Petersburg, 199178, Russian FederationEngineer, ITMO University, Saint Petersburg, 197101, Russian FederationEngineer, ITMO University, Saint Petersburg, 197101, Russian FederationEngineer, ITMO University, Saint Petersburg, 197101, Russian FederationPhD, Associate Professor, ITMO University, Saint Petersburg, 197101, Russian FederationThe existing approaches to detecting synthesized speech, based on the current issues of synthesizing voice sequences, are considered. The stages of the algorithm for detecting spoofing attacks on voice biometric systems are described, and its final workflow is presented. The research focuses mainly on detecting synthesized speech, as it is the most dangerous type of attacks. The authors designed a software application for an experimental study, present its structure and propose the detection synthesized speech algorithm. This algorithm uses mel-frequency and constant Q cepstral coefficients to extract speech features. A Gaussian mixture model is used to construct a user model. Convolutional neural network was chosen as a classifier to determine the voice’s authenticity. Two basic methods for combating spoofing attacks, proposed by the authors of the ASVspoof2019 competition, were selected for making comparisons. One of these methods involved using linear frequency cepstral coefficients as speech features, while the other method used constant Q. Both solutions used Gaussian mixture models for classification. To evaluate the effectiveness of the proposed solution and compare it with other methods, a voice database was created. The selected EER and minDCF metrics were applied. The experimental results demonstrated the advantages of the proposed algorithm in comparison with the other algorithms. An advantage of the proposed solution is that it uses extracted speech features that perform efficiently when it comes to user identification. This makes it possible to use the algorithm to optimize a voice biometric system that has embedded protection against spoofing attacks that is built on speech synthesis. In addition, it is possible to use the proposed method for voice identification with minimal modifications required. Voice biometric identification systems have excellent opportunities in the banking sector. Such systems allow banks to simplify and accelerate the process of financial transactions and provide their users with advanced banking functions remotely. The implementation of voice biometric systems is difficult by their vulnerability to spoofing attacks, particularly to those conducted by means of speech synthesis. The proposed solution can be integrated into voice biometric systems to improve their securityhttps://ntv.ifmo.ru/file/article/20581.pdfbiometricautomatic speaker verification in bankingsynthetic speechspoofing detectioncepstral analysisconvolutional neural network
spellingShingle	Roman A. Murtazin Aleksandr Yu. Kuznetsov Evgeny A. Fedorov Ilnur M. Garipov Anna V. Kholodenina Yulia B. Baldanova Alisa A. Vorobeva The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki biometric automatic speaker verification in banking synthetic speech spoofing detection cepstral analysis convolutional neural network
title	The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
title_full	The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
title_fullStr	The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
title_full_unstemmed	The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
title_short	The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
title_sort	speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
topic	biometric automatic speaker verification in banking synthetic speech spoofing detection cepstral analysis convolutional neural network
url	https://ntv.ifmo.ru/file/article/20581.pdf
work_keys_str_mv	AT romanamurtazin thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT aleksandryukuznetsov thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT evgenyafedorov thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT ilnurmgaripov thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT annavkholodenina thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT yuliabbaldanova thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT alisaavorobeva thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT romanamurtazin speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT aleksandryukuznetsov speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT evgenyafedorov speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT ilnurmgaripov speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT annavkholodenina speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT yuliabbaldanova speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork AT alisaavorobeva speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork

The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network

Similar Items