Global–Local Self-Attention Based Transformer for Speaker Verification

Transformer models are now widely used for speech processing tasks due to their powerful sequence modeling capabilities. Previous work determined an efficient way to model speaker embeddings using the Transformer model by combining transformers with convolutional networks. However, traditional globa...

Full description

Bibliographic Details
Main Authors: Fei Xie, Dalong Zhang, Chengming Liu
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/19/10154
_version_ 1827655036346302464
author Fei Xie
Dalong Zhang
Chengming Liu
author_facet Fei Xie
Dalong Zhang
Chengming Liu
author_sort Fei Xie
collection DOAJ
description Transformer models are now widely used for speech processing tasks due to their powerful sequence modeling capabilities. Previous work determined an efficient way to model speaker embeddings using the Transformer model by combining transformers with convolutional networks. However, traditional global self-attention mechanisms lack the ability to capture local information. To alleviate these problems, we proposed a novel global–local self-attention mechanism. Instead of using local or global multi-head attention alone, this method performs local and global attention in parallel in two parallel groups to enhance local modeling and reduce computational cost. To better handle local location information, we introduced locally enhanced location encoding in the speaker verification task. The experimental results of the VoxCeleb1 test set and the VoxCeleb2 dev set demonstrated the improved effect of our proposed global–local self-attention mechanism. Compared with the Transformer-based Robust Embedding Extractor Baseline System, the proposed speaker Transformer network exhibited better performance in the speaker verification task.
first_indexed 2024-03-09T22:00:07Z
format Article
id doaj.art-f5bdd959edaa4174b6dffaadf58cc19b
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T22:00:07Z
publishDate 2022-10-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-f5bdd959edaa4174b6dffaadf58cc19b2023-11-23T19:51:57ZengMDPI AGApplied Sciences2076-34172022-10-0112191015410.3390/app121910154Global–Local Self-Attention Based Transformer for Speaker VerificationFei Xie0Dalong Zhang1Chengming Liu2School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, ChinaZhongyuan Network Security Research Institute, Zhengzhou University, Zhengzhou 450002, ChinaSchool of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, ChinaTransformer models are now widely used for speech processing tasks due to their powerful sequence modeling capabilities. Previous work determined an efficient way to model speaker embeddings using the Transformer model by combining transformers with convolutional networks. However, traditional global self-attention mechanisms lack the ability to capture local information. To alleviate these problems, we proposed a novel global–local self-attention mechanism. Instead of using local or global multi-head attention alone, this method performs local and global attention in parallel in two parallel groups to enhance local modeling and reduce computational cost. To better handle local location information, we introduced locally enhanced location encoding in the speaker verification task. The experimental results of the VoxCeleb1 test set and the VoxCeleb2 dev set demonstrated the improved effect of our proposed global–local self-attention mechanism. Compared with the Transformer-based Robust Embedding Extractor Baseline System, the proposed speaker Transformer network exhibited better performance in the speaker verification task.https://www.mdpi.com/2076-3417/12/19/10154speaker recognitiontransformerself-attention mechanismspeaker verification
spellingShingle Fei Xie
Dalong Zhang
Chengming Liu
Global–Local Self-Attention Based Transformer for Speaker Verification
Applied Sciences
speaker recognition
transformer
self-attention mechanism
speaker verification
title Global–Local Self-Attention Based Transformer for Speaker Verification
title_full Global–Local Self-Attention Based Transformer for Speaker Verification
title_fullStr Global–Local Self-Attention Based Transformer for Speaker Verification
title_full_unstemmed Global–Local Self-Attention Based Transformer for Speaker Verification
title_short Global–Local Self-Attention Based Transformer for Speaker Verification
title_sort global local self attention based transformer for speaker verification
topic speaker recognition
transformer
self-attention mechanism
speaker verification
url https://www.mdpi.com/2076-3417/12/19/10154
work_keys_str_mv AT feixie globallocalselfattentionbasedtransformerforspeakerverification
AT dalongzhang globallocalselfattentionbasedtransformerforspeakerverification
AT chengmingliu globallocalselfattentionbasedtransformerforspeakerverification