Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition
Audio-visual speech recognition (AVSR) utilizes both audio and video modalities for the robust automatic speech recognition. Most deep neural network (DNN) has achieved promising performances in AVSR owing to its generalized and nonlinear mapping ability. However, these DNN models have two main disa...
Main Authors: | Yuan Yuan, Chunlin Tian, Xiaoqiang Lu |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8279447/ |
Similar Items
-
Generative Adversarial Networks (GANs) for Audio-Visual Speech Recognition in Artificial Intelligence IoT
by: Yibo He, et al.
Published: (2023-10-01) -
Data Augmentation for Audio-Visual Emotion Recognition with an Efficient Multimodal Conditional GAN
by: Fei Ma, et al.
Published: (2022-01-01) -
Detecting Audio Adversarial Examples in Automatic Speech Recognition Systems Using Decision Boundary Patterns
by: Wei Zong, et al.
Published: (2022-12-01) -
ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
by: D.V. Ivanko, et al.
Published: (2016-05-01) -
Collaborative Filtering Recommendation Algorithm Based on Attention GRU and Adversarial Learning
by: Hongbin Xia, et al.
Published: (2020-01-01)