Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition

Audio-visual speech recognition (AVSR) utilizes both audio and video modalities for the robust automatic speech recognition. Most deep neural network (DNN) has achieved promising performances in AVSR owing to its generalized and nonlinear mapping ability. However, these DNN models have two main disa...

Full description

Bibliographic Details
Main Authors: Yuan Yuan, Chunlin Tian, Xiaoqiang Lu
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8279447/