DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection

Abstract Spoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an automated spoofing detector that can be integrated into automatic speaker verification (ASV) systems. In this study, we recommend a novel and robust m...

Full description

Bibliographic Details
Main Authors:	Rabbia Mahum, Aun Irtaza, Ali Javed, Haitham A. Mahmoud, Haseeb Hassan
Format:	Article
Language:	English
Published:	SpringerOpen 2024-04-01
Series:	EURASIP Journal on Audio, Speech, and Music Processing
Subjects:	Deep learning Spoofing detector Fake speech detection
Online Access:	https://doi.org/10.1186/s13636-024-00335-9

_version_	1827284168668610560
author	Rabbia Mahum Aun Irtaza Ali Javed Haitham A. Mahmoud Haseeb Hassan
author_facet	Rabbia Mahum Aun Irtaza Ali Javed Haitham A. Mahmoud Haseeb Hassan
author_sort	Rabbia Mahum
collection	DOAJ
description	Abstract Spoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an automated spoofing detector that can be integrated into automatic speaker verification (ASV) systems. In this study, we recommend a novel and robust model, named DeepDet, based on deep-layered architecture, to categorize speech into two classes: spoofed and bonafide. DeepDet is an improved model based on Yet Another Mobile Network (YAMNet) employing a customized MobileNet combined with a bottleneck attention module (BAM). First, we convert audio into mel-spectrograms that consist of time–frequency representations on mel-scale. Second, we trained our deep layered model using the extracted mel-spectrograms on a Logical Access (LA) set, including synthesized speeches and voice conversions of the ASVspoof-2019 dataset. In the end, we classified the audios, utilizing our trained binary classifier. More precisely, we utilized the power of layered architecture and guided attention that can discern the spoofed speech from bonafide samples. Our proposed improved model employs depth-wise linearly separate convolutions, which makes our model lighter weight than existing techniques. Furthermore, we implemented extensive experiments to assess the performance of the suggested model using the ASVspoof 2019 corpus. We attained an equal error rate (EER) of 0.042% on Logical Access (LA), whereas 0.43% on Physical Access (PA) attacks. Therefore, the performance of the proposed model is significant on the ASVspoof 2019 dataset and indicates the effectiveness of the DeepDet over existing spoofing detectors. Additionally, our proposed model is robust enough that can identify the unseen spoofed audios and classifies the several attacks accurately.
first_indexed	2024-04-24T09:51:19Z
format	Article
id	doaj.art-a765588829b2424a80e3928d25e344e1
institution	Directory Open Access Journal
issn	1687-4722
language	English
last_indexed	2024-04-24T09:51:19Z
publishDate	2024-04-01
publisher	SpringerOpen
record_format	Article
series	EURASIP Journal on Audio, Speech, and Music Processing
spelling	doaj.art-a765588829b2424a80e3928d25e344e12024-04-14T11:23:52ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222024-04-012024111610.1186/s13636-024-00335-9DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detectionRabbia Mahum0Aun Irtaza1Ali Javed2Haitham A. Mahmoud3Haseeb Hassan4Computer Science Department, UET TaxilaComputer Science Department, UET TaxilaSoftware Engineering Department, UET TaxilaIndustrial Engineering Department, College of Engineering, King Saud UniversityCollege of Big Data and Internet, Shenzhen Technology University (SZTU)Abstract Spoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an automated spoofing detector that can be integrated into automatic speaker verification (ASV) systems. In this study, we recommend a novel and robust model, named DeepDet, based on deep-layered architecture, to categorize speech into two classes: spoofed and bonafide. DeepDet is an improved model based on Yet Another Mobile Network (YAMNet) employing a customized MobileNet combined with a bottleneck attention module (BAM). First, we convert audio into mel-spectrograms that consist of time–frequency representations on mel-scale. Second, we trained our deep layered model using the extracted mel-spectrograms on a Logical Access (LA) set, including synthesized speeches and voice conversions of the ASVspoof-2019 dataset. In the end, we classified the audios, utilizing our trained binary classifier. More precisely, we utilized the power of layered architecture and guided attention that can discern the spoofed speech from bonafide samples. Our proposed improved model employs depth-wise linearly separate convolutions, which makes our model lighter weight than existing techniques. Furthermore, we implemented extensive experiments to assess the performance of the suggested model using the ASVspoof 2019 corpus. We attained an equal error rate (EER) of 0.042% on Logical Access (LA), whereas 0.43% on Physical Access (PA) attacks. Therefore, the performance of the proposed model is significant on the ASVspoof 2019 dataset and indicates the effectiveness of the DeepDet over existing spoofing detectors. Additionally, our proposed model is robust enough that can identify the unseen spoofed audios and classifies the several attacks accurately.https://doi.org/10.1186/s13636-024-00335-9Deep learningSpoofing detectorFake speech detection
spellingShingle	Rabbia Mahum Aun Irtaza Ali Javed Haitham A. Mahmoud Haseeb Hassan DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection EURASIP Journal on Audio, Speech, and Music Processing Deep learning Spoofing detector Fake speech detection
title	DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
title_full	DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
title_fullStr	DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
title_full_unstemmed	DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
title_short	DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection
title_sort	deepdet yamnet with bottleneck attention module bam for tts synthesis detection
topic	Deep learning Spoofing detector Fake speech detection
url	https://doi.org/10.1186/s13636-024-00335-9
work_keys_str_mv	AT rabbiamahum deepdetyamnetwithbottleneckattentionmodulebamforttssynthesisdetection AT aunirtaza deepdetyamnetwithbottleneckattentionmodulebamforttssynthesisdetection AT alijaved deepdetyamnetwithbottleneckattentionmodulebamforttssynthesisdetection AT haithamamahmoud deepdetyamnetwithbottleneckattentionmodulebamforttssynthesisdetection AT haseebhassan deepdetyamnetwithbottleneckattentionmodulebamforttssynthesisdetection

DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection

Similar Items