A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
The Conformer enhanced Transformer by using convolution serial connected to the multi-head self-attention (MHSA). The method strengthened the local attention calculation and obtained a better effect in auto speech recognition. This paper proposes a hybrid attention mechanism which combines the dynam...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-05-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/11/10/1656 |
_version_ | 1797500220337029120 |
---|---|
author | Wei Liu Jiaming Sun Yiming Sun Chunyi Chen |
author_facet | Wei Liu Jiaming Sun Yiming Sun Chunyi Chen |
author_sort | Wei Liu |
collection | DOAJ |
description | The Conformer enhanced Transformer by using convolution serial connected to the multi-head self-attention (MHSA). The method strengthened the local attention calculation and obtained a better effect in auto speech recognition. This paper proposes a hybrid attention mechanism which combines the dynamic convolution CNNs and multi-head self-attention. This study focuses on generating local attention by embedding DY-CNNs in MHSA, followed by parallel computation of the globe and local attention inside the attention layer. Finally, concatenate the result of global and local attention to the output. In the experiments, we use the Aishell-1 (178 hours) Chinese database for training. In the testing folder dev/test, 4.5%/4.8% CER was obtained. The proposed method shows better performance in computation speed and the number of experimental parameters. The results are extremely close to the best result (4.4%/4.7%) of the Conformer. |
first_indexed | 2024-03-10T03:58:46Z |
format | Article |
id | doaj.art-348770c105df45af86ec947281e912ce |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T03:58:46Z |
publishDate | 2022-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-348770c105df45af86ec947281e912ce2023-11-23T10:48:24ZengMDPI AGElectronics2079-92922022-05-011110165610.3390/electronics11101656A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention MechanismWei Liu0Jiaming Sun1Yiming Sun2Chunyi Chen3College of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaCollege of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaCollege of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaCollege of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaThe Conformer enhanced Transformer by using convolution serial connected to the multi-head self-attention (MHSA). The method strengthened the local attention calculation and obtained a better effect in auto speech recognition. This paper proposes a hybrid attention mechanism which combines the dynamic convolution CNNs and multi-head self-attention. This study focuses on generating local attention by embedding DY-CNNs in MHSA, followed by parallel computation of the globe and local attention inside the attention layer. Finally, concatenate the result of global and local attention to the output. In the experiments, we use the Aishell-1 (178 hours) Chinese database for training. In the testing folder dev/test, 4.5%/4.8% CER was obtained. The proposed method shows better performance in computation speed and the number of experimental parameters. The results are extremely close to the best result (4.4%/4.7%) of the Conformer.https://www.mdpi.com/2079-9292/11/10/1656speech recognitionattentiondynamic convolutiontransformer |
spellingShingle | Wei Liu Jiaming Sun Yiming Sun Chunyi Chen A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism Electronics speech recognition attention dynamic convolution transformer |
title | A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism |
title_full | A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism |
title_fullStr | A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism |
title_full_unstemmed | A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism |
title_short | A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism |
title_sort | speech recognition model building method combined dynamic convolution and multi head self attention mechanism |
topic | speech recognition attention dynamic convolution transformer |
url | https://www.mdpi.com/2079-9292/11/10/1656 |
work_keys_str_mv | AT weiliu aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT jiamingsun aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT yimingsun aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT chunyichen aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT weiliu speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT jiamingsun speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT yimingsun speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT chunyichen speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism |