RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing

Traditional anti-spoofing systems cannot be used straightforwardly with streaming audio because they are designed for finite utterances. Such offline models can be applied in streaming with the help of buffering; however, they are not effective in terms of memory and computational consumption. We pr...

Full description

Bibliographic Details
Main Authors:	Petr Grinberg, Vladislav Shikhov
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Anti-spoofing ASVspoof challenge automatic speaker verification system countermeasure SASV challenge spoofing-aware speaker verification system
Online Access:	https://ieeexplore.ieee.org/document/10271307/

_version_	1797660797720068096
author	Petr Grinberg Vladislav Shikhov
author_facet	Petr Grinberg Vladislav Shikhov
author_sort	Petr Grinberg
collection	DOAJ
description	Traditional anti-spoofing systems cannot be used straightforwardly with streaming audio because they are designed for finite utterances. Such offline models can be applied in streaming with the help of buffering; however, they are not effective in terms of memory and computational consumption. We propose a novel approach called RawSpectrogram that makes offline models streaming-friendly without a significant drop in quality. The method was tested on RawNet2 and AASIST, resulting in new architectures called RawRNN (RawLSTM and RawGRU), RS-AASIST, and TAASIST. The RawRNN-type models are much smaller and achieve a better Equal Error Rate than their base architecture, RawNet2. RS-AASIST and TAASIST have fewer parameters than AASIST and achieve similar quality. We also proved our concept for models with time-frequency transform front-ends and automatic speaker verification systems by proposing RECAPA-TDNN based on ECAPA-TDNN. RS-AASIST and RECAPA-TDNN were combined into the first streaming-friendly spoofing-aware speaker verification system reported in the literature. This joint system achieves significantly better quality than the corresponding offline solutions. All our models require far fewer floating-point operations for score updates. RawSpectrogram usage significantly reduces the latency of the prediction and allows the system to update the probability with each new chunk from the stream, preserving all information from the past. To the best of our knowledge, TAASIST is the most successful voice anti-spoofing system that employs a vanilla Transformer trained using supervised learning.
first_indexed	2024-03-11T18:36:03Z
format	Article
id	doaj.art-986cc4060ce34aef922488f7ed5264b3
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-11T18:36:03Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-986cc4060ce34aef922488f7ed5264b32023-10-12T23:01:30ZengIEEEIEEE Access2169-35362023-01-011110992810993810.1109/ACCESS.2023.332191910271307RawSpectrogram: On the Way to Effective Streaming Speech Anti-SpoofingPetr Grinberg0https://orcid.org/0009-0008-4480-5595Vladislav Shikhov1https://orcid.org/0009-0006-5001-2714Samsung R&D Institute Russia (SRR), Moscow, RussiaSamsung R&D Institute Russia (SRR), Moscow, RussiaTraditional anti-spoofing systems cannot be used straightforwardly with streaming audio because they are designed for finite utterances. Such offline models can be applied in streaming with the help of buffering; however, they are not effective in terms of memory and computational consumption. We propose a novel approach called RawSpectrogram that makes offline models streaming-friendly without a significant drop in quality. The method was tested on RawNet2 and AASIST, resulting in new architectures called RawRNN (RawLSTM and RawGRU), RS-AASIST, and TAASIST. The RawRNN-type models are much smaller and achieve a better Equal Error Rate than their base architecture, RawNet2. RS-AASIST and TAASIST have fewer parameters than AASIST and achieve similar quality. We also proved our concept for models with time-frequency transform front-ends and automatic speaker verification systems by proposing RECAPA-TDNN based on ECAPA-TDNN. RS-AASIST and RECAPA-TDNN were combined into the first streaming-friendly spoofing-aware speaker verification system reported in the literature. This joint system achieves significantly better quality than the corresponding offline solutions. All our models require far fewer floating-point operations for score updates. RawSpectrogram usage significantly reduces the latency of the prediction and allows the system to update the probability with each new chunk from the stream, preserving all information from the past. To the best of our knowledge, TAASIST is the most successful voice anti-spoofing system that employs a vanilla Transformer trained using supervised learning.https://ieeexplore.ieee.org/document/10271307/Anti-spoofingASVspoof challengeautomatic speaker verification systemcountermeasureSASV challengespoofing-aware speaker verification system
spellingShingle	Petr Grinberg Vladislav Shikhov RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing IEEE Access Anti-spoofing ASVspoof challenge automatic speaker verification system countermeasure SASV challenge spoofing-aware speaker verification system
title	RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing
title_full	RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing
title_fullStr	RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing
title_full_unstemmed	RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing
title_short	RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing
title_sort	rawspectrogram on the way to effective streaming speech anti spoofing
topic	Anti-spoofing ASVspoof challenge automatic speaker verification system countermeasure SASV challenge spoofing-aware speaker verification system
url	https://ieeexplore.ieee.org/document/10271307/
work_keys_str_mv	AT petrgrinberg rawspectrogramonthewaytoeffectivestreamingspeechantispoofing AT vladislavshikhov rawspectrogramonthewaytoeffectivestreamingspeechantispoofing

RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing

Similar Items