A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism

The Conformer enhanced Transformer by using convolution serial connected to the multi-head self-attention (MHSA). The method strengthened the local attention calculation and obtained a better effect in auto speech recognition. This paper proposes a hybrid attention mechanism which combines the dynam...

Full description

Bibliographic Details
Main Authors:	Wei Liu, Jiaming Sun, Yiming Sun, Chunyi Chen
Format:	Article
Language:	English
Published:	MDPI AG 2022-05-01
Series:	Electronics
Subjects:	speech recognition attention dynamic convolution transformer
Online Access:	https://www.mdpi.com/2079-9292/11/10/1656

_version_	1797500220337029120
author	Wei Liu Jiaming Sun Yiming Sun Chunyi Chen
author_facet	Wei Liu Jiaming Sun Yiming Sun Chunyi Chen
author_sort	Wei Liu
collection	DOAJ
description	The Conformer enhanced Transformer by using convolution serial connected to the multi-head self-attention (MHSA). The method strengthened the local attention calculation and obtained a better effect in auto speech recognition. This paper proposes a hybrid attention mechanism which combines the dynamic convolution CNNs and multi-head self-attention. This study focuses on generating local attention by embedding DY-CNNs in MHSA, followed by parallel computation of the globe and local attention inside the attention layer. Finally, concatenate the result of global and local attention to the output. In the experiments, we use the Aishell-1 (178 hours) Chinese database for training. In the testing folder dev/test, 4.5%/4.8% CER was obtained. The proposed method shows better performance in computation speed and the number of experimental parameters. The results are extremely close to the best result (4.4%/4.7%) of the Conformer.
first_indexed	2024-03-10T03:58:46Z
format	Article
id	doaj.art-348770c105df45af86ec947281e912ce
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-10T03:58:46Z
publishDate	2022-05-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-348770c105df45af86ec947281e912ce2023-11-23T10:48:24ZengMDPI AGElectronics2079-92922022-05-011110165610.3390/electronics11101656A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention MechanismWei Liu0Jiaming Sun1Yiming Sun2Chunyi Chen3College of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaCollege of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaCollege of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaCollege of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, ChinaThe Conformer enhanced Transformer by using convolution serial connected to the multi-head self-attention (MHSA). The method strengthened the local attention calculation and obtained a better effect in auto speech recognition. This paper proposes a hybrid attention mechanism which combines the dynamic convolution CNNs and multi-head self-attention. This study focuses on generating local attention by embedding DY-CNNs in MHSA, followed by parallel computation of the globe and local attention inside the attention layer. Finally, concatenate the result of global and local attention to the output. In the experiments, we use the Aishell-1 (178 hours) Chinese database for training. In the testing folder dev/test, 4.5%/4.8% CER was obtained. The proposed method shows better performance in computation speed and the number of experimental parameters. The results are extremely close to the best result (4.4%/4.7%) of the Conformer.https://www.mdpi.com/2079-9292/11/10/1656speech recognitionattentiondynamic convolutiontransformer
spellingShingle	Wei Liu Jiaming Sun Yiming Sun Chunyi Chen A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism Electronics speech recognition attention dynamic convolution transformer
title	A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
title_full	A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
title_fullStr	A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
title_full_unstemmed	A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
title_short	A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
title_sort	speech recognition model building method combined dynamic convolution and multi head self attention mechanism
topic	speech recognition attention dynamic convolution transformer
url	https://www.mdpi.com/2079-9292/11/10/1656
work_keys_str_mv	AT weiliu aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT jiamingsun aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT yimingsun aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT chunyichen aspeechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT weiliu speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT jiamingsun speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT yimingsun speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism AT chunyichen speechrecognitionmodelbuildingmethodcombineddynamicconvolutionandmultiheadselfattentionmechanism

A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism

Similar Items