DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection

Abstract Spoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an automated spoofing detector that can be integrated into automatic speaker verification (ASV) systems. In this study, we recommend a novel and robust m...

Full description

Bibliographic Details
Main Authors: Rabbia Mahum, Aun Irtaza, Ali Javed, Haitham A. Mahmoud, Haseeb Hassan
Format: Article
Language:English
Published: SpringerOpen 2024-04-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:https://doi.org/10.1186/s13636-024-00335-9
_version_ 1827284168668610560
author Rabbia Mahum
Aun Irtaza
Ali Javed
Haitham A. Mahmoud
Haseeb Hassan
author_facet Rabbia Mahum
Aun Irtaza
Ali Javed
Haitham A. Mahmoud
Haseeb Hassan
author_sort Rabbia Mahum
collection DOAJ
description Abstract Spoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an automated spoofing detector that can be integrated into automatic speaker verification (ASV) systems. In this study, we recommend a novel and robust model, named DeepDet, based on deep-layered architecture, to categorize speech into two classes: spoofed and bonafide. DeepDet is an improved model based on Yet Another Mobile Network (YAMNet) employing a customized MobileNet combined with a bottleneck attention module (BAM). First, we convert audio into mel-spectrograms that consist of time–frequency representations on mel-scale. Second, we trained our deep layered model using the extracted mel-spectrograms on a Logical Access (LA) set, including synthesized speeches and voice conversions of the ASVspoof-2019 dataset. In the end, we classified the audios, utilizing our trained binary classifier. More precisely, we utilized the power of layered architecture and guided attention that can discern the spoofed speech from bonafide samples. Our proposed improved model employs depth-wise linearly separate convolutions, which makes our model lighter weight than existing techniques. Furthermore, we implemented extensive experiments to assess the performance of the suggested model using the ASVspoof 2019 corpus. We attained an equal error rate (EER) of 0.042% on Logical Access (LA), whereas 0.43% on Physical Access (PA) attacks. Therefore, the performance of the proposed model is significant on the ASVspoof 2019 dataset and indicates the effectiveness of the DeepDet over existing spoofing detectors. Additionally, our proposed model is robust enough that can identify the unseen spoofed audios and classifies the several attacks accurately.
first_indexed 2024-04-24T09:51:19Z
format Article
id doaj.art-a765588829b2424a80e3928d25e344e1
institution Directory Open Access Journal
issn 1687-4722
language English
last_indexed 2024-04-24T09:51:19Z
publishDate 2024-04-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Audio, Speech, and Music Processing
spelling doaj.art-a765588829b2424a80e3928d25e344e12024-04-14T11:23:52ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222024-04-012024111610.1186/s13636-024-00335-9DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detectionRabbia Mahum0Aun Irtaza1Ali Javed2Haitham A. Mahmoud3Haseeb Hassan4Computer Science Department, UET TaxilaComputer Science Department, UET TaxilaSoftware Engineering Department, UET TaxilaIndustrial Engineering Department, College of Engineering, King Saud UniversityCollege of Big Data and Internet, Shenzhen Technology University (SZTU)Abstract Spoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an automated spoofing detector that can be integrated into automatic speaker verification (ASV) systems. In this study, we recommend a novel and robust model, named DeepDet, based on deep-layered architecture, to categorize speech into two classes: spoofed and bonafide. DeepDet is an improved model based on Yet Another Mobile Network (YAMNet) employing a customized MobileNet combined with a bottleneck attention module (BAM). First, we convert audio into mel-spectrograms that consist of time–frequency representations on mel-scale. Second, we trained our deep layered model using the extracted mel-spectrograms on a Logical Access (LA) set, including synthesized speeches and voice conversions of the ASVspoof-2019 dataset. In the end, we classified the audios, utilizing our trained binary classifier. More precisely, we utilized the power of layered architecture and guided attention that can discern the spoofed speech from bonafide samples. Our proposed improved model employs depth-wise linearly separate convolutions, which makes our model lighter weight than existing techniques. Furthermore, we implemented extensive experiments to assess the performance of the suggested model using the ASVspoof 2019 corpus. We attained an equal error rate (EER) of 0.042% on Logical Access (LA), whereas 0.43% on Physical Access (PA) attacks. Therefore, the performance of the proposed model is significant on the ASVspoof 2019 dataset and indicates the effectiveness of the DeepDet over existing spoofing detectors. Additionally, our proposed model is robust enough that can identify the unseen spoofed audios and classifies the several attacks accurately.https://doi.org/10.1186/s13636-024-00335-9Deep learningSpoofing detectorFake speech detection
spellingShingle Rabbia Mahum
Aun Irtaza
Ali Javed
Haitham A. Mahmoud
Haseeb Hassan
DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
EURASIP Journal on Audio, Speech, and Music Processing
Deep learning
Spoofing detector
Fake speech detection
title DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
title_full DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
title_fullStr DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
title_full_unstemmed DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
title_short DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
title_sort deepdet yamnet with bottleneck attention module bam for tts synthesis detection
topic Deep learning
Spoofing detector
Fake speech detection
url https://doi.org/10.1186/s13636-024-00335-9
work_keys_str_mv AT rabbiamahum deepdetyamnetwithbottleneckattentionmodulebamforttssynthesisdetection
AT aunirtaza deepdetyamnetwithbottleneckattentionmodulebamforttssynthesisdetection
AT alijaved deepdetyamnetwithbottleneckattentionmodulebamforttssynthesisdetection
AT haithamamahmoud deepdetyamnetwithbottleneckattentionmodulebamforttssynthesisdetection
AT haseebhassan deepdetyamnetwithbottleneckattentionmodulebamforttssynthesisdetection