The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network

The existing approaches to detecting synthesized speech, based on the current issues of synthesizing voice sequences, are considered. The stages of the algorithm for detecting spoofing attacks on voice biometric systems are described, and its final workflow is presented. The research focuses mainly...

Full description

Bibliographic Details
Main Authors: Roman A. Murtazin, Aleksandr Yu. Kuznetsov, Evgeny A. Fedorov, Ilnur M. Garipov, Anna V. Kholodenina, Yulia B. Baldanova, Alisa A. Vorobeva
Format: Article
Language:English
Published: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 2021-08-01
Series:Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Subjects:
Online Access:https://ntv.ifmo.ru/file/article/20581.pdf
_version_ 1819096076512133120
author Roman A. Murtazin
Aleksandr Yu. Kuznetsov
Evgeny A. Fedorov
Ilnur M. Garipov
Anna V. Kholodenina
Yulia B. Baldanova
Alisa A. Vorobeva
author_facet Roman A. Murtazin
Aleksandr Yu. Kuznetsov
Evgeny A. Fedorov
Ilnur M. Garipov
Anna V. Kholodenina
Yulia B. Baldanova
Alisa A. Vorobeva
author_sort Roman A. Murtazin
collection DOAJ
description The existing approaches to detecting synthesized speech, based on the current issues of synthesizing voice sequences, are considered. The stages of the algorithm for detecting spoofing attacks on voice biometric systems are described, and its final workflow is presented. The research focuses mainly on detecting synthesized speech, as it is the most dangerous type of attacks. The authors designed a software application for an experimental study, present its structure and propose the detection synthesized speech algorithm. This algorithm uses mel-frequency and constant Q cepstral coefficients to extract speech features. A Gaussian mixture model is used to construct a user model. Convolutional neural network was chosen as a classifier to determine the voice’s authenticity. Two basic methods for combating spoofing attacks, proposed by the authors of the ASVspoof2019 competition, were selected for making comparisons. One of these methods involved using linear frequency cepstral coefficients as speech features, while the other method used constant Q. Both solutions used Gaussian mixture models for classification. To evaluate the effectiveness of the proposed solution and compare it with other methods, a voice database was created. The selected EER and minDCF metrics were applied. The experimental results demonstrated the advantages of the proposed algorithm in comparison with the other algorithms. An advantage of the proposed solution is that it uses extracted speech features that perform efficiently when it comes to user identification. This makes it possible to use the algorithm to optimize a voice biometric system that has embedded protection against spoofing attacks that is built on speech synthesis. In addition, it is possible to use the proposed method for voice identification with minimal modifications required. Voice biometric identification systems have excellent opportunities in the banking sector. Such systems allow banks to simplify and accelerate the process of financial transactions and provide their users with advanced banking functions remotely. The implementation of voice biometric systems is difficult by their vulnerability to spoofing attacks, particularly to those conducted by means of speech synthesis. The proposed solution can be integrated into voice biometric systems to improve their security
first_indexed 2024-12-21T23:53:27Z
format Article
id doaj.art-85e3ccfead6e4d3e818545195a4a8fd8
institution Directory Open Access Journal
issn 2226-1494
2500-0373
language English
last_indexed 2024-12-21T23:53:27Z
publishDate 2021-08-01
publisher Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
record_format Article
series Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
spelling doaj.art-85e3ccfead6e4d3e818545195a4a8fd82022-12-21T18:45:52ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732021-08-0121454555210.17586/2226-1494-2021-21-4-545-552The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural networkRoman A. Murtazin0https://orcid.org/0000-0003-3669-7586Aleksandr Yu. Kuznetsov1https://orcid.org/0000-0002-5702-3786Evgeny A. Fedorov2https://orcid.org/0000-0003-2911-5509Ilnur M. Garipov3https://orcid.org/0000-0003-3108-5484Anna V. Kholodenina4https://orcid.org/0000-0003-1911-3710Yulia B. Baldanova5https://orcid.org/0000-0002-6751-8993Alisa A. Vorobeva6https://orcid.org/0000-0001-6691-6167Engineer, ITMO University, Saint Petersburg, 197101, Russian FederationPhD, Associate Professor, ITMO University, Saint Petersburg, 197101, Russian FederationEngineer, ITMO University, Saint Petersburg, 197101, Russian Federation; Technical Specialist, Laboratory PPS Ltd, Saint Petersburg, 199178, Russian FederationEngineer, ITMO University, Saint Petersburg, 197101, Russian FederationEngineer, ITMO University, Saint Petersburg, 197101, Russian FederationEngineer, ITMO University, Saint Petersburg, 197101, Russian FederationPhD, Associate Professor, ITMO University, Saint Petersburg, 197101, Russian FederationThe existing approaches to detecting synthesized speech, based on the current issues of synthesizing voice sequences, are considered. The stages of the algorithm for detecting spoofing attacks on voice biometric systems are described, and its final workflow is presented. The research focuses mainly on detecting synthesized speech, as it is the most dangerous type of attacks. The authors designed a software application for an experimental study, present its structure and propose the detection synthesized speech algorithm. This algorithm uses mel-frequency and constant Q cepstral coefficients to extract speech features. A Gaussian mixture model is used to construct a user model. Convolutional neural network was chosen as a classifier to determine the voice’s authenticity. Two basic methods for combating spoofing attacks, proposed by the authors of the ASVspoof2019 competition, were selected for making comparisons. One of these methods involved using linear frequency cepstral coefficients as speech features, while the other method used constant Q. Both solutions used Gaussian mixture models for classification. To evaluate the effectiveness of the proposed solution and compare it with other methods, a voice database was created. The selected EER and minDCF metrics were applied. The experimental results demonstrated the advantages of the proposed algorithm in comparison with the other algorithms. An advantage of the proposed solution is that it uses extracted speech features that perform efficiently when it comes to user identification. This makes it possible to use the algorithm to optimize a voice biometric system that has embedded protection against spoofing attacks that is built on speech synthesis. In addition, it is possible to use the proposed method for voice identification with minimal modifications required. Voice biometric identification systems have excellent opportunities in the banking sector. Such systems allow banks to simplify and accelerate the process of financial transactions and provide their users with advanced banking functions remotely. The implementation of voice biometric systems is difficult by their vulnerability to spoofing attacks, particularly to those conducted by means of speech synthesis. The proposed solution can be integrated into voice biometric systems to improve their securityhttps://ntv.ifmo.ru/file/article/20581.pdfbiometricautomatic speaker verification in bankingsynthetic speechspoofing detectioncepstral analysisconvolutional neural network
spellingShingle Roman A. Murtazin
Aleksandr Yu. Kuznetsov
Evgeny A. Fedorov
Ilnur M. Garipov
Anna V. Kholodenina
Yulia B. Baldanova
Alisa A. Vorobeva
The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
biometric
automatic speaker verification in banking
synthetic speech
spoofing detection
cepstral analysis
convolutional neural network
title The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
title_full The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
title_fullStr The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
title_full_unstemmed The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
title_short The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
title_sort speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network
topic biometric
automatic speaker verification in banking
synthetic speech
spoofing detection
cepstral analysis
convolutional neural network
url https://ntv.ifmo.ru/file/article/20581.pdf
work_keys_str_mv AT romanamurtazin thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT aleksandryukuznetsov thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT evgenyafedorov thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT ilnurmgaripov thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT annavkholodenina thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT yuliabbaldanova thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT alisaavorobeva thespeechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT romanamurtazin speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT aleksandryukuznetsov speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT evgenyafedorov speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT ilnurmgaripov speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT annavkholodenina speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT yuliabbaldanova speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork
AT alisaavorobeva speechsynthesisdetectionalgorithmbasedoncepstralcoefficientsandconvolutionalneuralnetwork