Global–Local Self-Attention Based Transformer for Speaker Verification
Transformer models are now widely used for speech processing tasks due to their powerful sequence modeling capabilities. Previous work determined an efficient way to model speaker embeddings using the Transformer model by combining transformers with convolutional networks. However, traditional globa...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-10-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/19/10154 |
_version_ | 1827655036346302464 |
---|---|
author | Fei Xie Dalong Zhang Chengming Liu |
author_facet | Fei Xie Dalong Zhang Chengming Liu |
author_sort | Fei Xie |
collection | DOAJ |
description | Transformer models are now widely used for speech processing tasks due to their powerful sequence modeling capabilities. Previous work determined an efficient way to model speaker embeddings using the Transformer model by combining transformers with convolutional networks. However, traditional global self-attention mechanisms lack the ability to capture local information. To alleviate these problems, we proposed a novel global–local self-attention mechanism. Instead of using local or global multi-head attention alone, this method performs local and global attention in parallel in two parallel groups to enhance local modeling and reduce computational cost. To better handle local location information, we introduced locally enhanced location encoding in the speaker verification task. The experimental results of the VoxCeleb1 test set and the VoxCeleb2 dev set demonstrated the improved effect of our proposed global–local self-attention mechanism. Compared with the Transformer-based Robust Embedding Extractor Baseline System, the proposed speaker Transformer network exhibited better performance in the speaker verification task. |
first_indexed | 2024-03-09T22:00:07Z |
format | Article |
id | doaj.art-f5bdd959edaa4174b6dffaadf58cc19b |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-09T22:00:07Z |
publishDate | 2022-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-f5bdd959edaa4174b6dffaadf58cc19b2023-11-23T19:51:57ZengMDPI AGApplied Sciences2076-34172022-10-0112191015410.3390/app121910154Global–Local Self-Attention Based Transformer for Speaker VerificationFei Xie0Dalong Zhang1Chengming Liu2School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, ChinaZhongyuan Network Security Research Institute, Zhengzhou University, Zhengzhou 450002, ChinaSchool of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, ChinaTransformer models are now widely used for speech processing tasks due to their powerful sequence modeling capabilities. Previous work determined an efficient way to model speaker embeddings using the Transformer model by combining transformers with convolutional networks. However, traditional global self-attention mechanisms lack the ability to capture local information. To alleviate these problems, we proposed a novel global–local self-attention mechanism. Instead of using local or global multi-head attention alone, this method performs local and global attention in parallel in two parallel groups to enhance local modeling and reduce computational cost. To better handle local location information, we introduced locally enhanced location encoding in the speaker verification task. The experimental results of the VoxCeleb1 test set and the VoxCeleb2 dev set demonstrated the improved effect of our proposed global–local self-attention mechanism. Compared with the Transformer-based Robust Embedding Extractor Baseline System, the proposed speaker Transformer network exhibited better performance in the speaker verification task.https://www.mdpi.com/2076-3417/12/19/10154speaker recognitiontransformerself-attention mechanismspeaker verification |
spellingShingle | Fei Xie Dalong Zhang Chengming Liu Global–Local Self-Attention Based Transformer for Speaker Verification Applied Sciences speaker recognition transformer self-attention mechanism speaker verification |
title | Global–Local Self-Attention Based Transformer for Speaker Verification |
title_full | Global–Local Self-Attention Based Transformer for Speaker Verification |
title_fullStr | Global–Local Self-Attention Based Transformer for Speaker Verification |
title_full_unstemmed | Global–Local Self-Attention Based Transformer for Speaker Verification |
title_short | Global–Local Self-Attention Based Transformer for Speaker Verification |
title_sort | global local self attention based transformer for speaker verification |
topic | speaker recognition transformer self-attention mechanism speaker verification |
url | https://www.mdpi.com/2076-3417/12/19/10154 |
work_keys_str_mv | AT feixie globallocalselfattentionbasedtransformerforspeakerverification AT dalongzhang globallocalselfattentionbasedtransformerforspeakerverification AT chengmingliu globallocalselfattentionbasedtransformerforspeakerverification |