Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model

While speaker verification represents a critically important application of speaker recognition, it is also the most challenging and least well-understood application. Robust feature extraction plays an integral role in enhancing the efficiency of forensic speaker verification. Although the speech s...

Full description

Bibliographic Details
Main Authors: Gaurav, Saurabh Bhardwaj, Ravinder Agarwal
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/10/2342
_version_ 1797600359875608576
author Gaurav
Saurabh Bhardwaj
Ravinder Agarwal
author_facet Gaurav
Saurabh Bhardwaj
Ravinder Agarwal
author_sort Gaurav
collection DOAJ
description While speaker verification represents a critically important application of speaker recognition, it is also the most challenging and least well-understood application. Robust feature extraction plays an integral role in enhancing the efficiency of forensic speaker verification. Although the speech signal is a continuous one-dimensional time series, most recent models depend on recurrent neural network (RNN) or convolutional neural network (CNN) models, which are not able to exhaustively represent human speech, thus opening themselves up to speech forgery. As a result, to accurately simulate human speech and to further ensure speaker authenticity, we must establish a reliable technique. This research article presents a Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification (TTFEM-AFSV) model, which aims to overcome the limitations of the previous models. The TTFEM-AFSV model focuses on verifying speakers in forensic applications by exploiting the average median filtering (AMF) technique to discard the noise in speech signals. Subsequently, the MFCC and spectrograms are considered as the inputs to the deep convolutional neural network-based Inception v3 model, and the Ant Lion Optimizer (ALO) algorithm is utilized to fine-tune the hyperparameters related to the Inception v3 model. Finally, a long short-term memory with a recurrent neural network (LSTM-RNN) mechanism is employed as a classifier for automated speaker recognition. The performance validation of the TTFEM-AFSV model was tested in a series of experiments. Comparative study revealed the significantly improved performance of the TTFEM-AFSV model over recent approaches.
first_indexed 2024-03-11T03:47:01Z
format Article
id doaj.art-b8511047019f4d36ba4fbfc850be9fa8
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-11T03:47:01Z
publishDate 2023-05-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-b8511047019f4d36ba4fbfc850be9fa82023-11-18T01:11:03ZengMDPI AGElectronics2079-92922023-05-011210234210.3390/electronics12102342Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification ModelGaurav0Saurabh Bhardwaj1Ravinder Agarwal2Electrical and Instrumentation Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, IndiaElectrical and Instrumentation Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, IndiaElectrical and Instrumentation Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, IndiaWhile speaker verification represents a critically important application of speaker recognition, it is also the most challenging and least well-understood application. Robust feature extraction plays an integral role in enhancing the efficiency of forensic speaker verification. Although the speech signal is a continuous one-dimensional time series, most recent models depend on recurrent neural network (RNN) or convolutional neural network (CNN) models, which are not able to exhaustively represent human speech, thus opening themselves up to speech forgery. As a result, to accurately simulate human speech and to further ensure speaker authenticity, we must establish a reliable technique. This research article presents a Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification (TTFEM-AFSV) model, which aims to overcome the limitations of the previous models. The TTFEM-AFSV model focuses on verifying speakers in forensic applications by exploiting the average median filtering (AMF) technique to discard the noise in speech signals. Subsequently, the MFCC and spectrograms are considered as the inputs to the deep convolutional neural network-based Inception v3 model, and the Ant Lion Optimizer (ALO) algorithm is utilized to fine-tune the hyperparameters related to the Inception v3 model. Finally, a long short-term memory with a recurrent neural network (LSTM-RNN) mechanism is employed as a classifier for automated speaker recognition. The performance validation of the TTFEM-AFSV model was tested in a series of experiments. Comparative study revealed the significantly improved performance of the TTFEM-AFSV model over recent approaches.https://www.mdpi.com/2079-9292/12/10/2342automated speaker recognitiondeep learningant lion optimizerfeature extractionspectrogramsspeech signals
spellingShingle Gaurav
Saurabh Bhardwaj
Ravinder Agarwal
Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model
Electronics
automated speaker recognition
deep learning
ant lion optimizer
feature extraction
spectrograms
speech signals
title Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model
title_full Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model
title_fullStr Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model
title_full_unstemmed Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model
title_short Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model
title_sort two tier feature extraction with metaheuristics based automated forensic speaker verification model
topic automated speaker recognition
deep learning
ant lion optimizer
feature extraction
spectrograms
speech signals
url https://www.mdpi.com/2079-9292/12/10/2342
work_keys_str_mv AT gaurav twotierfeatureextractionwithmetaheuristicsbasedautomatedforensicspeakerverificationmodel
AT saurabhbhardwaj twotierfeatureextractionwithmetaheuristicsbasedautomatedforensicspeakerverificationmodel
AT ravinderagarwal twotierfeatureextractionwithmetaheuristicsbasedautomatedforensicspeakerverificationmodel