Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition
Abstract Automatic speaker recognition is an important biometric authentication approach with emerging applications. However, recent research has shown its vulnerability on adversarial attacks. In this paper, we propose a new type of adversarial examples by generating imperceptible adversarial sampl...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2022-06-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-022-00782-x |
_version_ | 1827982254905753600 |
---|---|
author | Xingyu Zhang Xiongwei Zhang Meng Sun Xia Zou Kejiang Chen Nenghai Yu |
author_facet | Xingyu Zhang Xiongwei Zhang Meng Sun Xia Zou Kejiang Chen Nenghai Yu |
author_sort | Xingyu Zhang |
collection | DOAJ |
description | Abstract Automatic speaker recognition is an important biometric authentication approach with emerging applications. However, recent research has shown its vulnerability on adversarial attacks. In this paper, we propose a new type of adversarial examples by generating imperceptible adversarial samples for targeted attacks on black-box systems of automatic speaker recognition. Waveform samples are created directly by solving an optimization problem with waveform inputs and outputs, which is more realistic in real-life scenario. Inspired by auditory masking, a regularization term adapting to the energy of speech waveform is proposed for generating imperceptible adversarial perturbations. The optimization problems are subsequently solved by differential evolution algorithm in a black-box manner which does not require any knowledge on the inner configuration of the recognition systems. Experiments conducted on commonly used data sets, LibriSpeech and VoxCeleb, show that the proposed methods have successfully performed targeted attacks on state-of-the-art speaker recognition systems while being imperceptible to human listeners. Given the high SNR and PESQ scores of the yielded adversarial samples, the proposed methods deteriorate less on the quality of the original signals than several recently proposed methods, which justifies the imperceptibility of adversarial samples. |
first_indexed | 2024-04-09T22:31:37Z |
format | Article |
id | doaj.art-4a2f3536ab004517885c1dbd82c81aa7 |
institution | Directory Open Access Journal |
issn | 2199-4536 2198-6053 |
language | English |
last_indexed | 2024-04-09T22:31:37Z |
publishDate | 2022-06-01 |
publisher | Springer |
record_format | Article |
series | Complex & Intelligent Systems |
spelling | doaj.art-4a2f3536ab004517885c1dbd82c81aa72023-03-22T12:43:52ZengSpringerComplex & Intelligent Systems2199-45362198-60532022-06-0191657910.1007/s40747-022-00782-xImperceptible black-box waveform-level adversarial attack towards automatic speaker recognitionXingyu Zhang0Xiongwei Zhang1Meng Sun2Xia Zou3Kejiang Chen4Nenghai Yu5Laboratory of Intelligent Information Processing, Army Engineering UniversityLaboratory of Intelligent Information Processing, Army Engineering UniversityLaboratory of Intelligent Information Processing, Army Engineering UniversityLaboratory of Intelligent Information Processing, Army Engineering UniversityDepartment of Electronic Engineering and Information Science, University of Science and Technology of ChinaDepartment of Electronic Engineering and Information Science, University of Science and Technology of ChinaAbstract Automatic speaker recognition is an important biometric authentication approach with emerging applications. However, recent research has shown its vulnerability on adversarial attacks. In this paper, we propose a new type of adversarial examples by generating imperceptible adversarial samples for targeted attacks on black-box systems of automatic speaker recognition. Waveform samples are created directly by solving an optimization problem with waveform inputs and outputs, which is more realistic in real-life scenario. Inspired by auditory masking, a regularization term adapting to the energy of speech waveform is proposed for generating imperceptible adversarial perturbations. The optimization problems are subsequently solved by differential evolution algorithm in a black-box manner which does not require any knowledge on the inner configuration of the recognition systems. Experiments conducted on commonly used data sets, LibriSpeech and VoxCeleb, show that the proposed methods have successfully performed targeted attacks on state-of-the-art speaker recognition systems while being imperceptible to human listeners. Given the high SNR and PESQ scores of the yielded adversarial samples, the proposed methods deteriorate less on the quality of the original signals than several recently proposed methods, which justifies the imperceptibility of adversarial samples.https://doi.org/10.1007/s40747-022-00782-xAutomatic speaker recognitionAdversarial examplesImperceptibilityBlack-box attackDifferential evolutionAuditory masking |
spellingShingle | Xingyu Zhang Xiongwei Zhang Meng Sun Xia Zou Kejiang Chen Nenghai Yu Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition Complex & Intelligent Systems Automatic speaker recognition Adversarial examples Imperceptibility Black-box attack Differential evolution Auditory masking |
title | Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition |
title_full | Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition |
title_fullStr | Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition |
title_full_unstemmed | Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition |
title_short | Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition |
title_sort | imperceptible black box waveform level adversarial attack towards automatic speaker recognition |
topic | Automatic speaker recognition Adversarial examples Imperceptibility Black-box attack Differential evolution Auditory masking |
url | https://doi.org/10.1007/s40747-022-00782-x |
work_keys_str_mv | AT xingyuzhang imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition AT xiongweizhang imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition AT mengsun imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition AT xiazou imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition AT kejiangchen imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition AT nenghaiyu imperceptibleblackboxwaveformleveladversarialattacktowardsautomaticspeakerrecognition |