Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model

Since attention mechanism was introduced in neural machine translation, attention has been combined with the long short-term memory (LSTM) or replaced the LSTM in a transformer model to overcome the sequence-to-sequence (seq2seq) problems with the LSTM. In contrast to the neural machine translation,...

Full description

Bibliographic Details
Main Authors: Yong-Hyeok Lee, Dong-Won Jang, Jae-Bin Kim, Rae-Hong Park, Hyung-Min Park
Format: Article
Language:English
Published: MDPI AG 2020-10-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/20/7263