Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition

Abstract Automatic speaker recognition is an important biometric authentication approach with emerging applications. However, recent research has shown its vulnerability on adversarial attacks. In this paper, we propose a new type of adversarial examples by generating imperceptible adversarial sampl...

Full description

Bibliographic Details
Main Authors: Xingyu Zhang, Xiongwei Zhang, Meng Sun, Xia Zou, Kejiang Chen, Nenghai Yu
Format: Article
Language:English
Published: Springer 2022-06-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-022-00782-x
_version_ 1827982254905753600
author Xingyu Zhang
Xiongwei Zhang
Meng Sun
Xia Zou
Kejiang Chen
Nenghai Yu
author_facet Xingyu Zhang
Xiongwei Zhang
Meng Sun
Xia Zou
Kejiang Chen
Nenghai Yu
author_sort Xingyu Zhang
collection DOAJ
description Abstract Automatic speaker recognition is an important biometric authentication approach with emerging applications. However, recent research has shown its vulnerability on adversarial attacks. In this paper, we propose a new type of adversarial examples by generating imperceptible adversarial samples for targeted attacks on black-box systems of automatic speaker recognition. Waveform samples are created directly by solving an optimization problem with waveform inputs and outputs, which is more realistic in real-life scenario. Inspired by auditory masking, a regularization term adapting to the energy of speech waveform is proposed for generating imperceptible adversarial perturbations. The optimization problems are subsequently solved by differential evolution algorithm in a black-box manner which does not require any knowledge on the inner configuration of the recognition systems. Experiments conducted on commonly used data sets, LibriSpeech and VoxCeleb, show that the proposed methods have successfully performed targeted attacks on state-of-the-art speaker recognition systems while being imperceptible to human listeners. Given the high SNR and PESQ scores of the yielded adversarial samples, the proposed methods deteriorate less on the quality of the original signals than several recently proposed methods, which justifies the imperceptibility of adversarial samples.
first_indexed 2024-04-09T22:31:37Z
format Article
id doaj.art-4a2f3536ab004517885c1dbd82c81aa7
institution Directory Open Access Journal
issn 2199-4536
2198-6053
language English
last_indexed 2024-04-09T22:31:37Z
publishDate 2022-06-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj.art-4a2f3536ab004517885c1dbd82c81aa72023-03-22T12:43:52ZengSpringerComplex & Intelligent Systems2199-45362198-60532022-06-0191657910.1007/s40747-022-00782-xImperceptible black-box waveform-level adversarial attack towards automatic speaker recognitionXingyu Zhang0Xiongwei Zhang1Meng Sun2Xia Zou3Kejiang Chen4Nenghai Yu5Laboratory of Intelligent Information Processing, Army Engineering UniversityLaboratory of Intelligent Information Processing, Army Engineering UniversityLaboratory of Intelligent Information Processing, Army Engineering UniversityLaboratory of Intelligent Information Processing, Army Engineering UniversityDepartment of Electronic Engineering and Information Science, University of Science and Technology of ChinaDepartment of Electronic Engineering and Information Science, University of Science and Technology of ChinaAbstract Automatic speaker recognition is an important biometric authentication approach with emerging applications. However, recent research has shown its vulnerability on adversarial attacks. In this paper, we propose a new type of adversarial examples by generating imperceptible adversarial samples for targeted attacks on black-box systems of automatic speaker recognition. Waveform samples are created directly by solving an optimization problem with waveform inputs and outputs, which is more realistic in real-life scenario. Inspired by auditory masking, a regularization term adapting to the energy of speech waveform is proposed for generating imperceptible adversarial perturbations. The optimization problems are subsequently solved by differential evolution algorithm in a black-box manner which does not require any knowledge on the inner configuration of the recognition systems. Experiments conducted on commonly used data sets, LibriSpeech and VoxCeleb, show that the proposed methods have successfully performed targeted attacks on state-of-the-art speaker recognition systems while being imperceptible to human listeners. Given the high SNR and PESQ scores of the yielded adversarial samples, the proposed methods deteriorate less on the quality of the original signals than several recently proposed methods, which justifies the imperceptibility of adversarial samples.https://doi.org/10.1007/s40747-022-00782-xAutomatic speaker recognitionAdversarial examplesImperceptibilityBlack-box attackDifferential evolutionAuditory masking
spellingShingle Xingyu Zhang
Xiongwei Zhang
Meng Sun
Xia Zou
Kejiang Chen
Nenghai Yu
Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition
Complex & Intelligent Systems
Automatic speaker recognition
Adversarial examples
Imperceptibility
Black-box attack
Differential evolution
Auditory masking
title Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition
title_full Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition
title_fullStr Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition
title_full_unstemmed Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition
title_short Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition
title_sort imperceptible black box waveform level adversarial attack towards automatic speaker recognition
topic Automatic speaker recognition
Adversarial examples
Imperceptibility
Black-box attack
Differential evolution
Auditory masking
url https://doi.org/10.1007/s40747-022-00782-x
work_keys_str_mv AT xingyuzhang imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition
AT xiongweizhang imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition
AT mengsun imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition
AT xiazou imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition
AT kejiangchen imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition
AT nenghaiyu imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition