End-to-End Sentence-Level Multi-View Lipreading Architecture with Spatial Attention Module Integrated Multiple CNNs and Cascaded Local Self-Attention-CTC
Concomitant with the recent advances in deep learning, automatic speech recognition and visual speech recognition (VSR) have received considerable attention. However, although VSR systems must identify speech from both frontal and profile faces in real-world scenarios, most VSR studies have focused...
Main Authors: | Sanghun Jeon, Mun Sang Kim |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-05-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/22/9/3597 |
Similar Items
-
Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks
by: Tao Zhang, et al.
Published: (2021-07-01) -
Improving Hybrid CTC/Attention Architecture with Time-Restricted Self-Attention CTC for End-to-End Speech Recognition
by: Long Wu, et al.
Published: (2019-10-01) -
Lipreading Architecture Based on Multiple Convolutional Neural Networks for Sentence-Level Visual Speech Recognition
by: Sanghun Jeon, et al.
Published: (2021-12-01) -
A Survey of Research on Lipreading Technology
by: Mingfeng Hao, et al.
Published: (2020-01-01) -
A representation of abstract linguistic categories in the visual system underlies successful lipreading
by: Aaron R Nidiffer, et al.
Published: (2023-11-01)