RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing

Traditional anti-spoofing systems cannot be used straightforwardly with streaming audio because they are designed for finite utterances. Such offline models can be applied in streaming with the help of buffering; however, they are not effective in terms of memory and computational consumption. We pr...

Full description

Bibliographic Details
Main Authors: Petr Grinberg, Vladislav Shikhov
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10271307/
_version_ 1797660797720068096
author Petr Grinberg
Vladislav Shikhov
author_facet Petr Grinberg
Vladislav Shikhov
author_sort Petr Grinberg
collection DOAJ
description Traditional anti-spoofing systems cannot be used straightforwardly with streaming audio because they are designed for finite utterances. Such offline models can be applied in streaming with the help of buffering; however, they are not effective in terms of memory and computational consumption. We propose a novel approach called RawSpectrogram that makes offline models streaming-friendly without a significant drop in quality. The method was tested on RawNet2 and AASIST, resulting in new architectures called RawRNN (RawLSTM and RawGRU), RS-AASIST, and TAASIST. The RawRNN-type models are much smaller and achieve a better Equal Error Rate than their base architecture, RawNet2. RS-AASIST and TAASIST have fewer parameters than AASIST and achieve similar quality. We also proved our concept for models with time-frequency transform front-ends and automatic speaker verification systems by proposing RECAPA-TDNN based on ECAPA-TDNN. RS-AASIST and RECAPA-TDNN were combined into the first streaming-friendly spoofing-aware speaker verification system reported in the literature. This joint system achieves significantly better quality than the corresponding offline solutions. All our models require far fewer floating-point operations for score updates. RawSpectrogram usage significantly reduces the latency of the prediction and allows the system to update the probability with each new chunk from the stream, preserving all information from the past. To the best of our knowledge, TAASIST is the most successful voice anti-spoofing system that employs a vanilla Transformer trained using supervised learning.
first_indexed 2024-03-11T18:36:03Z
format Article
id doaj.art-986cc4060ce34aef922488f7ed5264b3
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-11T18:36:03Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-986cc4060ce34aef922488f7ed5264b32023-10-12T23:01:30ZengIEEEIEEE Access2169-35362023-01-011110992810993810.1109/ACCESS.2023.332191910271307RawSpectrogram: On the Way to Effective Streaming Speech Anti-SpoofingPetr Grinberg0https://orcid.org/0009-0008-4480-5595Vladislav Shikhov1https://orcid.org/0009-0006-5001-2714Samsung R&D Institute Russia (SRR), Moscow, RussiaSamsung R&D Institute Russia (SRR), Moscow, RussiaTraditional anti-spoofing systems cannot be used straightforwardly with streaming audio because they are designed for finite utterances. Such offline models can be applied in streaming with the help of buffering; however, they are not effective in terms of memory and computational consumption. We propose a novel approach called RawSpectrogram that makes offline models streaming-friendly without a significant drop in quality. The method was tested on RawNet2 and AASIST, resulting in new architectures called RawRNN (RawLSTM and RawGRU), RS-AASIST, and TAASIST. The RawRNN-type models are much smaller and achieve a better Equal Error Rate than their base architecture, RawNet2. RS-AASIST and TAASIST have fewer parameters than AASIST and achieve similar quality. We also proved our concept for models with time-frequency transform front-ends and automatic speaker verification systems by proposing RECAPA-TDNN based on ECAPA-TDNN. RS-AASIST and RECAPA-TDNN were combined into the first streaming-friendly spoofing-aware speaker verification system reported in the literature. This joint system achieves significantly better quality than the corresponding offline solutions. All our models require far fewer floating-point operations for score updates. RawSpectrogram usage significantly reduces the latency of the prediction and allows the system to update the probability with each new chunk from the stream, preserving all information from the past. To the best of our knowledge, TAASIST is the most successful voice anti-spoofing system that employs a vanilla Transformer trained using supervised learning.https://ieeexplore.ieee.org/document/10271307/Anti-spoofingASVspoof challengeautomatic speaker verification systemcountermeasureSASV challengespoofing-aware speaker verification system
spellingShingle Petr Grinberg
Vladislav Shikhov
RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing
IEEE Access
Anti-spoofing
ASVspoof challenge
automatic speaker verification system
countermeasure
SASV challenge
spoofing-aware speaker verification system
title RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing
title_full RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing
title_fullStr RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing
title_full_unstemmed RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing
title_short RawSpectrogram: On the Way to Effective Streaming Speech Anti-Spoofing
title_sort rawspectrogram on the way to effective streaming speech anti spoofing
topic Anti-spoofing
ASVspoof challenge
automatic speaker verification system
countermeasure
SASV challenge
spoofing-aware speaker verification system
url https://ieeexplore.ieee.org/document/10271307/
work_keys_str_mv AT petrgrinberg rawspectrogramonthewaytoeffectivestreamingspeechantispoofing
AT vladislavshikhov rawspectrogramonthewaytoeffectivestreamingspeechantispoofing