Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification

Voice-controlled devices are in demand due to their hands-free controls. However, using voice-controlled devices in sensitive scenarios like smartphone applications and financial transactions requires protection against fraudulent attacks referred to as “speech spoofing”. The algorithms used in spoo...

Full description

Bibliographic Details
Main Authors:	Hiren Mewada, Jawad F. Al-Asad, Faris A. Almalki, Adil H. Khan, Nouf Abdullah Almujally, Samir El-Nakla, Qamar Naith
Format:	Article
Language:	English
Published:	MDPI AG 2023-07-01
Series:	Sensors
Subjects:	anti-spoofing ASVspoof convolutional neural network genuine speech detection voice conversion
Online Access:	https://www.mdpi.com/1424-8220/23/14/6637

_version_	1797587463892369408
author	Hiren Mewada Jawad F. Al-Asad Faris A. Almalki Adil H. Khan Nouf Abdullah Almujally Samir El-Nakla Qamar Naith
author_facet	Hiren Mewada Jawad F. Al-Asad Faris A. Almalki Adil H. Khan Nouf Abdullah Almujally Samir El-Nakla Qamar Naith
author_sort	Hiren Mewada
collection	DOAJ
description	Voice-controlled devices are in demand due to their hands-free controls. However, using voice-controlled devices in sensitive scenarios like smartphone applications and financial transactions requires protection against fraudulent attacks referred to as “speech spoofing”. The algorithms used in spoof attacks are practically unknown; hence, further analysis and development of spoof-detection models for improving spoof classification are required. A study of the spoofed-speech spectrum suggests that high-frequency features are able to discriminate genuine speech from spoofed speech well. Typically, linear or triangular filter banks are used to obtain high-frequency features. However, a Gaussian filter can extract more global information than a triangular filter. In addition, MFCC features are preferable among other speech features because of their lower covariance. Therefore, in this study, the use of a Gaussian filter is proposed for the extraction of inverted MFCC (iMFCC) features, providing high-frequency features. Complementary features are integrated with iMFCC to strengthen the features that aid in the discrimination of spoof speech. Deep learning has been proven to be efficient in classification applications, but the selection of its hyper-parameters and architecture is crucial and directly affects performance. Therefore, a Bayesian algorithm is used to optimize the BiLSTM network. Thus, in this study, we build a high-frequency-based optimized BiLSTM network to classify the spoofed-speech signal, and we present an extensive investigation using the ASVSpoof 2017 dataset. The optimized BiLSTM model is successfully trained with the least epoch and achieved a 99.58% validation accuracy. The proposed algorithm achieved a 6.58% EER on the evaluation dataset, with a relative improvement of 78% on a baseline spoof-identification system.
first_indexed	2024-03-11T00:39:20Z
format	Article
id	doaj.art-a148459aad2244708927795fea252a9b
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-11T00:39:20Z
publishDate	2023-07-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-a148459aad2244708927795fea252a9b2023-11-18T21:20:31ZengMDPI AGSensors1424-82202023-07-012314663710.3390/s23146637Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech ClassificationHiren Mewada0Jawad F. Al-Asad1Faris A. Almalki2Adil H. Khan3Nouf Abdullah Almujally4Samir El-Nakla5Qamar Naith6Electrical Engineering Department, Prince Mohammad bin Fahd University, P.O. Box 1664, Al Khobar 31952, Saudi ArabiaElectrical Engineering Department, Prince Mohammad bin Fahd University, P.O. Box 1664, Al Khobar 31952, Saudi ArabiaDepartment of Computer Engineering, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi ArabiaElectrical Engineering Department, Prince Mohammad bin Fahd University, P.O. Box 1664, Al Khobar 31952, Saudi ArabiaDepartment of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi ArabiaElectrical Engineering Department, Prince Mohammad bin Fahd University, P.O. Box 1664, Al Khobar 31952, Saudi ArabiaDepartment of Software Engineering, College of Computer Science and Engineering, University of Jeddah, P.O. Box 34, Jeddah 21959, Saudi ArabiaVoice-controlled devices are in demand due to their hands-free controls. However, using voice-controlled devices in sensitive scenarios like smartphone applications and financial transactions requires protection against fraudulent attacks referred to as “speech spoofing”. The algorithms used in spoof attacks are practically unknown; hence, further analysis and development of spoof-detection models for improving spoof classification are required. A study of the spoofed-speech spectrum suggests that high-frequency features are able to discriminate genuine speech from spoofed speech well. Typically, linear or triangular filter banks are used to obtain high-frequency features. However, a Gaussian filter can extract more global information than a triangular filter. In addition, MFCC features are preferable among other speech features because of their lower covariance. Therefore, in this study, the use of a Gaussian filter is proposed for the extraction of inverted MFCC (iMFCC) features, providing high-frequency features. Complementary features are integrated with iMFCC to strengthen the features that aid in the discrimination of spoof speech. Deep learning has been proven to be efficient in classification applications, but the selection of its hyper-parameters and architecture is crucial and directly affects performance. Therefore, a Bayesian algorithm is used to optimize the BiLSTM network. Thus, in this study, we build a high-frequency-based optimized BiLSTM network to classify the spoofed-speech signal, and we present an extensive investigation using the ASVSpoof 2017 dataset. The optimized BiLSTM model is successfully trained with the least epoch and achieved a 99.58% validation accuracy. The proposed algorithm achieved a 6.58% EER on the evaluation dataset, with a relative improvement of 78% on a baseline spoof-identification system.https://www.mdpi.com/1424-8220/23/14/6637anti-spoofingASVspoofconvolutional neural networkgenuine speech detectionvoice conversion
spellingShingle	Hiren Mewada Jawad F. Al-Asad Faris A. Almalki Adil H. Khan Nouf Abdullah Almujally Samir El-Nakla Qamar Naith Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification Sensors anti-spoofing ASVspoof convolutional neural network genuine speech detection voice conversion
title	Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
title_full	Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
title_fullStr	Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
title_full_unstemmed	Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
title_short	Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification
title_sort	gaussian filtered high frequency feature trained optimized bilstm network for spoofed speech classification
topic	anti-spoofing ASVspoof convolutional neural network genuine speech detection voice conversion
url	https://www.mdpi.com/1424-8220/23/14/6637
work_keys_str_mv	AT hirenmewada gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT jawadfalasad gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT farisaalmalki gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT adilhkhan gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT noufabdullahalmujally gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT samirelnakla gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification AT qamarnaith gaussianfilteredhighfrequencyfeaturetrainedoptimizedbilstmnetworkforspoofedspeechclassification

Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification

Similar Items