A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism

The Conformer enhanced Transformer by using convolution serial connected to the multi-head self-attention (MHSA). The method strengthened the local attention calculation and obtained a better effect in auto speech recognition. This paper proposes a hybrid attention mechanism which combines the dynam...

Full description

Bibliographic Details
Main Authors: Wei Liu, Jiaming Sun, Yiming Sun, Chunyi Chen
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/10/1656
_version_ 1797500220337029120
author Wei Liu
Jiaming Sun
Yiming Sun
Chunyi Chen
author_facet Wei Liu
Jiaming Sun
Yiming Sun
Chunyi Chen
author_sort Wei Liu
collection DOAJ
description The Conformer enhanced Transformer by using convolution serial connected to the multi-head self-attention (MHSA). The method strengthened the local attention calculation and obtained a better effect in auto speech recognition. This paper proposes a hybrid attention mechanism which combines the dynamic convolution CNNs and multi-head self-attention. This study focuses on generating local attention by embedding DY-CNNs in MHSA, followed by parallel computation of the globe and local attention inside the attention layer. Finally, concatenate the result of global and local attention to the output. In the experiments, we use the Aishell-1 (178 hours) Chinese database for training. In the testing folder dev/test, 4.5%/4.8% CER was obtained. The proposed method shows better performance in computation speed and the number of experimental parameters. The results are extremely close to the best result (4.4%/4.7%) of the Conformer.
first_indexed 2024-03-10T03:58:46Z
format Article
id doaj.art-348770c105df45af86ec947281e912ce
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T03:58:46Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-348770c105df45af86ec947281e912ce2023-11-23T10:48:24ZengMDPI AGElectronics2079-92922022-05-011110165610.3390/electronics11101656A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention MechanismWei Liu0Jiaming Sun1Yiming Sun2Chunyi Chen3College of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaCollege of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaCollege of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaCollege of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaThe Conformer enhanced Transformer by using convolution serial connected to the multi-head self-attention (MHSA). The method strengthened the local attention calculation and obtained a better effect in auto speech recognition. This paper proposes a hybrid attention mechanism which combines the dynamic convolution CNNs and multi-head self-attention. This study focuses on generating local attention by embedding DY-CNNs in MHSA, followed by parallel computation of the globe and local attention inside the attention layer. Finally, concatenate the result of global and local attention to the output. In the experiments, we use the Aishell-1 (178 hours) Chinese database for training. In the testing folder dev/test, 4.5%/4.8% CER was obtained. The proposed method shows better performance in computation speed and the number of experimental parameters. The results are extremely close to the best result (4.4%/4.7%) of the Conformer.https://www.mdpi.com/2079-9292/11/10/1656speech recognitionattentiondynamic convolutiontransformer
spellingShingle Wei Liu
Jiaming Sun
Yiming Sun
Chunyi Chen
A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
Electronics
speech recognition
attention
dynamic convolution
transformer
title A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
title_full A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
title_fullStr A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
title_full_unstemmed A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
title_short A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
title_sort speech recognition model building method combined dynamic convolution and multi head self attention mechanism
topic speech recognition
attention
dynamic convolution
transformer
url https://www.mdpi.com/2079-9292/11/10/1656
work_keys_str_mv AT weiliu aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism
AT jiamingsun aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism
AT yimingsun aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism
AT chunyichen aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism
AT weiliu speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism
AT jiamingsun speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism
AT yimingsun speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism
AT chunyichen speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism