Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model
While speaker verification represents a critically important application of speaker recognition, it is also the most challenging and least well-understood application. Robust feature extraction plays an integral role in enhancing the efficiency of forensic speaker verification. Although the speech s...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-05-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/10/2342 |
_version_ | 1797600359875608576 |
---|---|
author | Gaurav Saurabh Bhardwaj Ravinder Agarwal |
author_facet | Gaurav Saurabh Bhardwaj Ravinder Agarwal |
author_sort | Gaurav |
collection | DOAJ |
description | While speaker verification represents a critically important application of speaker recognition, it is also the most challenging and least well-understood application. Robust feature extraction plays an integral role in enhancing the efficiency of forensic speaker verification. Although the speech signal is a continuous one-dimensional time series, most recent models depend on recurrent neural network (RNN) or convolutional neural network (CNN) models, which are not able to exhaustively represent human speech, thus opening themselves up to speech forgery. As a result, to accurately simulate human speech and to further ensure speaker authenticity, we must establish a reliable technique. This research article presents a Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification (TTFEM-AFSV) model, which aims to overcome the limitations of the previous models. The TTFEM-AFSV model focuses on verifying speakers in forensic applications by exploiting the average median filtering (AMF) technique to discard the noise in speech signals. Subsequently, the MFCC and spectrograms are considered as the inputs to the deep convolutional neural network-based Inception v3 model, and the Ant Lion Optimizer (ALO) algorithm is utilized to fine-tune the hyperparameters related to the Inception v3 model. Finally, a long short-term memory with a recurrent neural network (LSTM-RNN) mechanism is employed as a classifier for automated speaker recognition. The performance validation of the TTFEM-AFSV model was tested in a series of experiments. Comparative study revealed the significantly improved performance of the TTFEM-AFSV model over recent approaches. |
first_indexed | 2024-03-11T03:47:01Z |
format | Article |
id | doaj.art-b8511047019f4d36ba4fbfc850be9fa8 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-11T03:47:01Z |
publishDate | 2023-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-b8511047019f4d36ba4fbfc850be9fa82023-11-18T01:11:03ZengMDPI AGElectronics2079-92922023-05-011210234210.3390/electronics12102342Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification ModelGaurav0Saurabh Bhardwaj1Ravinder Agarwal2Electrical and Instrumentation Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, IndiaElectrical and Instrumentation Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, IndiaElectrical and Instrumentation Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, IndiaWhile speaker verification represents a critically important application of speaker recognition, it is also the most challenging and least well-understood application. Robust feature extraction plays an integral role in enhancing the efficiency of forensic speaker verification. Although the speech signal is a continuous one-dimensional time series, most recent models depend on recurrent neural network (RNN) or convolutional neural network (CNN) models, which are not able to exhaustively represent human speech, thus opening themselves up to speech forgery. As a result, to accurately simulate human speech and to further ensure speaker authenticity, we must establish a reliable technique. This research article presents a Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification (TTFEM-AFSV) model, which aims to overcome the limitations of the previous models. The TTFEM-AFSV model focuses on verifying speakers in forensic applications by exploiting the average median filtering (AMF) technique to discard the noise in speech signals. Subsequently, the MFCC and spectrograms are considered as the inputs to the deep convolutional neural network-based Inception v3 model, and the Ant Lion Optimizer (ALO) algorithm is utilized to fine-tune the hyperparameters related to the Inception v3 model. Finally, a long short-term memory with a recurrent neural network (LSTM-RNN) mechanism is employed as a classifier for automated speaker recognition. The performance validation of the TTFEM-AFSV model was tested in a series of experiments. Comparative study revealed the significantly improved performance of the TTFEM-AFSV model over recent approaches.https://www.mdpi.com/2079-9292/12/10/2342automated speaker recognitiondeep learningant lion optimizerfeature extractionspectrogramsspeech signals |
spellingShingle | Gaurav Saurabh Bhardwaj Ravinder Agarwal Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model Electronics automated speaker recognition deep learning ant lion optimizer feature extraction spectrograms speech signals |
title | Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model |
title_full | Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model |
title_fullStr | Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model |
title_full_unstemmed | Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model |
title_short | Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model |
title_sort | two tier feature extraction with metaheuristics based automated forensic speaker verification model |
topic | automated speaker recognition deep learning ant lion optimizer feature extraction spectrograms speech signals |
url | https://www.mdpi.com/2079-9292/12/10/2342 |
work_keys_str_mv | AT gaurav twotierfeatureextractionwithmetaheuristicsbasedautomatedforensicspeakerverificationmodel AT saurabhbhardwaj twotierfeatureextractionwithmetaheuristicsbasedautomatedforensicspeakerverificationmodel AT ravinderagarwal twotierfeatureextractionwithmetaheuristicsbasedautomatedforensicspeakerverificationmodel |